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Abstract 

The paper studies a problem of constructing simultaneous likelihood-based 
confidence sets. We consider a simultaneous multiplier bootstrap procedure 
for estimating the quantiles of the joint distribution of the likelihood ratio 
statistics, and for adjusting the confidence level for multiplicity. Theoretical 
results state the bootstrap validity in the following setting: the sample size n 
is fixed, the maximal parameter dimension p max and the number of considered 
parametric models K are s.t. (log Jv ) 12 p^ ax /n is small. We also consider the 
situation when the parametric models are misspecified. If the models’ misspec- 
ification is significant, then the bootstrap critical values exceed the true ones 
and the simultaneous bootstrap confidence set becomes conservative. Numeri¬ 
cal experiments for local constant and local quadratic regressions illustrate the 
theoretical results. 
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1 Introduction 

The problem of simultaneous confidence estimation appears in numerous practical ap¬ 
plications when a confidence statement has to be made simultaneously for a collection 
of objects, e.g. in safety analysis in clinical trials, gene expression analysis, population 
biology, functional magnetic resonance imaging and many others. See e.g. Miller (1981); 
Westfall (1993); Manly (2006); Benjamini (2010); Dickhaus (2014), and references therein. 
This problem is also closely related to construction of simultaneous confidence bands in 
curve estimation, which goes back to Working and Hotelling (1929). For an extensive 
literature review about constructing the simultaneous confidence bands we refer to Hall 
and Horowitz (2013), Liu (2010), and Wasserman (2006). 

A simultaneous confidence set requires a probability bound to be constructed jointly 
for several possibly dependent statistics. Therefore, the critical values of the corre¬ 
sponding statistics should be chosen in such a way that the joint probability distribution 
achieves a required family-wise confidence level. This choice can be made by multiplicity 
correction of the marginal confidence levels. The Bonferroni correction method (Bonfer- 
roni (1936)) uses a probability union bound, the corrected marginal significance levels are 
taken equal to the total level divided by the number of models. This procedure can be 
very conservative if the considered statistics are positively correlated and if their number 
is large. The Sidak correction method (Sidak (1967)) is more powerful than Bonferroni 
correction, however, it also becomes conservative in the case of large number of dependent 
statistics. 

Most of the existing results about simultaneous bootstrap confidence sets and resampling- 
based multiple testing are asymptotic (with sample size tending to infinity), see e.g. 
Beran (1988, 1990); Hall and Pittelkow (1990); Hardle and Marron (1991); Shao and 
Tu (1995); Hall and Horowitz (2013), and Westfall (1993); Dickhaus (2014). The results 
based on asymptotic distribution of maximum of an approximating Gaussian process (see 
Bickel and Rosenblatt (1973); Johnston (1982); Hardle (1989)) require a huge sample size 
n, since they yield a coverage probability error of order (log(n)) -1 (see Hall (1991)). 
Some papers considered an alternative approach in context of confidence band estima¬ 
tion based on the approximation of the underlying empirical processes by its bootstrap 
counterpart. In particular, Hall (1993) showed that such an approach leads to a signifi¬ 
cant improvement of the error rate (see also Neumann and Polzehl (1998); Claeskens and 
Van Keilegom (2003)). Chernozhukov et al. (2014a) constructed honest confidence bands 
for nonparametric density estimators without requiring the existence of limit distribution 
of the supremum of the studentized empirical process: instead, they used an approxima¬ 
tion between sup-norms of an empirical and Gaussian processes, and anti-concentration 
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property of suprema of Gaussian processes. 

In many modern applications the sample size cannot be large, and/or can be smaller 
than a parameter dimension, for example, in genomics, brain imaging, spatial epidemi¬ 
ology and microarray data analysis, see Leek and Storey (2008); Kim and van de Wiel 
(2008); Arlot et al. (2010); Cao and Kosorok (2011), and references therein. 

For the recent results on resampling-based simultaneous confidence sets in high¬ 
dimensional finite sample set-up we refer to the papers by Arlot et al. (2010) and Cher- 
nozhukov et al. (2013a, 2014a, b). Arlot et al. (2010) considered i.i.d. observations of a 
Gaussian vector with a dimension possibly much larger than the sample size, and with 
unknown covariance matrix. They examined multiple testing problems for the mean 
values of its coordinates and provided non-asymptotic control for the family-wise error 
rate using resampling-type procedures. Chernozhukov et al. (2013a) presented a number 
of non-asymptotic results on Gaussian approximation and multiplier bootstrap for max¬ 
ima of sums of high-dimensional vectors (with a dimension possibly much larger than 
a sample size) in a very general set-up. As an application the authors considered the 
problem of multiple hypothesis testing in the framework of approximate means. They 
derived non-asymptotic results for the general stepdown procedure by Romano and Wolf 
(2005) with improved error rates and in high-dimensional setting. Chernozhukov et al. 
(2014a) showed how this technique applies to the problem of constructing an honest con¬ 
fidence set in nonparametric density estimation. Chernozhukov et al. (2014b) extended 
the results from maxima to the class of sparsely convex sets. 

The present paper studies simultaneous likelihood-based bootstrap confidence sets in 
the following setting: 

1. the sample size n is fixed; 

2. the parametric models can be misspecified; 

3. the number K of the parametric models can be exponentially large w.r.t. n ; 

4. the maximal dimension p max of the considered parametric models can be depen¬ 
dent on the sample size n. 

This set-up, in contrast with the paper by Chernozhukov et al. (2014b), does not require 
the sparsity condition , in particular the dimension pi,,px of each parametric family 
may grow with the sample size. Moreover, the simultaneous likelihood-based confidence 
sets are not necessarily convex, and the parametric assumption can be violated. 

The considered simultaneous multiplier bootstrap procedure involves two main steps: 
estimation of the quantile functions of the likelihood ratio statistics, and multiplicity 
correction of the marginal confidence level. Theoretical results of the paper state the 
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bootstrap validity in the setting 1-4 taking in account the multiplicity correction. The 
resulting approximation bound requires the quantity (log K) l2 p^ aax /n to be small. The 
log-factor here is suboptimal and can probably be improved. The paper particularly 
focuses on the impact of the model misspecification. We distinguish between slight and 
strong misspecifications. Under the so called small modeling bias condition (SmB) given 
in Section 5.2 the bootstrap approximation is accurate. This condition roughly means 
that all the parametric models are close to the true distribution. If the (SmB) condition 
is not fulfilled, then the simultaneous bootstrap confidence set is still applicable, however, 
it becomes conservative. This property is nicely confirmed by the numerical experiments 
in Section 4. 

Let the random data 


Y d = (W, 


,Y n ) 


( 1 . 1 ) 


consist of independent observations L), and belong to the probability space (17, J 7 , IP). 
The sample size n is fixed. IP is an unknown probability distribution of the sample Y . 
Consider K regular parametric families of probability distributions: 

Hof 

(n-(0)} = {A(0) < Mo, 0 e 0k C M Pk } , k = l,...,K. 


Each parametric family induces the quasi log-likelihood function for 0 £ 0 k C M Pk 


L k {Y,0) = log 


( dJPk{0) 
V duo 





f dP k {6) 
V dpLo 



(1.2) 


It is important that we do not require that IP belongs to any of the known parametric 
families {]P k (6)} , that is why the term quasi log-likelihood is used here. Below in this 
section we consider two popular examples of simultaneous confidence sets in terms of the 
quasi log-likelihood functions (1.2). Namely, the simultaneous confidence band for local 
constant regression, and multiple quantiles regression. 

The target of estimation for the misspecihed log-likelihood L k (6) is such a parameter 
0* k , that minimises the Kullback-Leibler distance between the unknown true measure IP 
and the parametric family {JP k (G)} : 


6 ]1 = argmaxlE , Lfc(0). (1-3) 

0e0 fc 

The maximum likelihood estimator is defined as: 


6 k = argmaxL fc (0). 
0G0 fe 
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The parametric sets 0 k have dimensions pk , therefore, 9k, 9* k E M Pk . For 1 < k, j < K 
and k ^ j the numbers pk and pj can be unequal. 

The likelihood-based confidence set for the target parameter G k is 

= f {o e 0k ■■ L k (e k ) - L k (9) < 3 2 /2} c M Pk . (1.4) 

Let 3fc(«) denote the (1 — a) -quantile of the corresponding square-root likelihood ratio 
statistic: 


3fc(«) = f inf { 3 > 0 : P (. L k (G k ) - L k (9* k ) > 3 2 /2) < a} . (1.5) 

Together with (1.4) this implies for each k = 1,..., K : 


JP(d* k E £ fc (3*(a))) > 1-a. 


( 1 . 6 ) 


Thus Skid) and the quantile function % k (a) fully determine the marginal (1 — a)- 
confidence set. The simultaneous confidence set requires a correction for multiplicity . 
Let c(a) denote a maximal number cE(0,a] s.t. 


P (Ufc =1 { " 2Lk ^ > 3 *( c )}) <«■ ( 1 - 7 ) 

This is equivalent to 


> 0 J < a 

Therefore, taking the marginal confidence sets with the same confidence levels 1 — c(a) 
yields the simultaneous confidence bound of the total level 1 — a. The value c(a) E (0, a] 
is the correction for multiplicity. In order to construct the simultaneous confidence set 
using this correction, one has to estimate the values 3 fc(c(a)) for all k = 1, ... ,K. By 
its definition this problem splits into two subproblems: 



/ \ def 

c{a) = sup< 


c E (0, a] : IP 


( 


max 
\l<k<K 


2L k (e k )-2L k (ei)- ik (c ) 


1. Marginal step. Estimation of the marginal quantile functions 3 i(a) , ..., 3 ^(a) 
given in (1.5). 

2. Correction for multiplicity. Estimation of the correction for multiplicity c(a) 
given in (1.8). 

If the 1-st problem is solved for any a E (0,1), the 2-nd problem can be treated by 
calibrating the value a s.t. (1.8) holds. It is important to take into account the corre¬ 
lation between the likelihood ratio statistics L k {6 k ) — L k (6 k ), k = 1,..., K , otherwise 
the estimate of the correction c(a) can be too conservative. For instance, the Bonferroni 
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correction would lead to the marginal confidence level 1 — a/K, which may be very 
conservative if K is large and the statistics L k (9 k ) ~ L k (9* k ) are highly correlated. 

In Section 2 we suggest a multiplier bootstrap procedure, which performs the steps 
1 and 2 described above. Theoretical justification of the procedure is given in Section 
3. The proofs are based on several approximation bounds: non-asymptotic square-root 
Wilks theorem, simultaneous Gaussian approximation for 1 2 -norms, Gaussian compari¬ 
son, and simultaneous Gaussian anti-concentration inequality. 

Spokoiny and Zhilova (2014) considered the 1 -st subproblem for the case of a single 
parametric model (K = 1): a multiplier bootstrap procedure was applied for construc¬ 
tion of a likelihood-based confidence set, and justified theoretically for a fixed sample 
size and for possibly misspecified parametric model. In the present paper we extend that 
approach for the case of simultaneously many parametric models. 

Below we illustrate the definitions (1.2)-(1.8) of the simultaneous likelihood-based 
confidence sets with two popular examples. 

Example 1 (Simultaneous confidence band for local constant regression): 
Let Y ±,..., Y n be independent random scalar observations and X \,..., X n some deter¬ 
ministic design points. Consider the following quadratic likelihood function reweighted 
with the kernel functions K(-): 

L(6, x, h) d = ~ - 6) 2 Wi(x,h), 

Wi(x, h) = f K({x - Xi}/h ), 

K(x ) 6 [0,1], f K(x)dx = 1, K(x) = I\{—x). 

Jn 

Here h > 0 denotes bandwidth, the local smoothing parameter. The target point and 
the local MLE read as: 


6*(x, h) 


def J2 1 j=l W i( X ^) ]EY i 

£S=i Wi{x,h) 


0(x, h) 


def YJi =1 w ii x i h ) Y i 

E"= 1 h ) 


9(x,h ) is also known as Nadaraya-Watson estimate. Fix a bandwidth h and consider 
the range of points x\,..., xk- They yield K local constant models with the target 
parameters 6\ '= 9*{x k ,h) and the likelihood functions L k {9) ( = L(9,x k ,h) for k = 
1 ,,K. The confidence intervals for each model are defined as 


£k(d,h) = f {fl E 0 : L(9(x k ,h),xk,h) - L(9,x k ,h ) < 3 2 /2| , 

for the quintile functions $ k ( a ) an d for the multiplicity correction c(a) from (1.5) and 
(1.8) they form the following simultaneous confidence band: 

^ (n^i{ 0 fc e £ fc (3fc (cCc*)))}) >l-a. 
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In Section 4 we provide results of numerical experiments for this model. 

Example 2 (Multiple quantiles regression): Quantile regression is an important 
method of statistical analysis, widely used in various applications. It aims at estimat¬ 
ing conditional quantile functions of a response variable, see Koenker (2005). Multiple 
quantiles regression model considers simultaneously several quantile regression functions 
based on a range of quantile indices, see e.g. Liu and Wu (2011); Qu (2008); He (1997). 
Let Yi,... ,Y n be independent random scalar observations and X \,..., X n E M d some 
deterministic design points, as in Example 1. Consider the following quantile regression 
models for k = 1,..., K : 


Yi = g k (Xi) + s kii , i = 1,... ,n, 

where g k {x) : lR d H > 1R are unknown functions, the random values £fc,i, • • •, £k,n are 
independent for each fixed k , and 

lP(£k,i < 0) = Tfc for all i = 1,..., n. 

The range of quantile indices t}, ..., tk G (0,1) is known and fixed. We are interested in 
simultaneous parametric confidence sets for the functions (•),... ,5 a'(-) • Let f k (x, 9) : 
m d x M Pk i —> 1R be known regression functions. Using the quantile regression approach 
by Koenker and Bassett Jr (1978), this problem can be treated with the quasi maximum 
likelihood method and the following log-likelihood functions: 

L k (0) = -E" ,PT k (Y i -f k (X i ,0)), 

z - J l=± 

Pr k (x) = f x(r k - l{x < 0}). 

for k = 1 ,...,K. This quasi log-likelihood function corresponds to the Asymmetric 
Laplace distribution with the density r k ( 1 — T k )e~ pT k^ x ~ a ' ) .If r = 1/2, then /C>i /2 C^) = 
\x\/2 and L(9) = — 5^=1 I Yi — fk(Xi , 9) | /2 , which corresponds to the median regres¬ 
sion. 

The paper is organised as follows: Section 2 describes the multiplier bootstrap proce¬ 
dure, Section 3 explains the ideas of the theoretical approach and provides main results 
in Sections 3.1 and 3.2 correspondingly. All the necessary conditions are given in Section 
5. In Section 5.3 and in statements of the main theoretical results we provide information 
about dependence of the involved terms on the sample size and parametric dimensions 
in the case of i.i.d. observations. Proofs of the main results are given in Section C. 
Statements from Sections A and B are used for the proofs in Section C. Numerical ex¬ 
periments are described in Section 4: we construct simultaneous confidence corridors 
for local constant and local quadratic regressions using both bootstrap and Monte Carlo 
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procedures. The quality of the bootstrap procedure is checked by computing the effective 
simultaneous coverage probabilities of the bootstrap confidence sets. We also compare 
the widths of the confidence bands and the values of multiplicity correction obtained 
with bootstrap and with Monte Carlo procedures. The experiments confirm that the 
multiplier bootstrap and the bootstrap multiplicity correction become conservative if the 
local parametric model is considerably misspecified. 

The results given here are valid on a random set of probability 1 — Ce~ x for some 
explicit constant C > 0. The number x > 0 determines this dominating probability 
level. For the case of the i.i.d. observations (see Secion 5.3) we take x = Clogn. 
Throughout the text || • || denotes the Euclidean norm for a vector and spectral norm for 
a matrix. || • || max is the maximal absolute value of elements of a vector (or a matrix), 

def def 

Psum = Pi H-1 -PK, Pmax = max p k . 

1 <k<K 

2 The multiplier bootstrap procedure 

Let denote the log-density from the k -th parametric distribution family evaluated 

at the i -th observation: 

e.M») = log (^Vo). ( 2 .i) 

then due to independence of Y\,... ,Y n 

E Tl 

. £i ik (0) \/k = l,...,K. 

Consider i.i.d. scalar random variables Ui independent of the data Y , s.t. Eui = 1 , 
Var m = 1, iEexp(uj) < oo (e.g. Ui ~ Af(l, 1) or m ~ exp( 1) or U{ ~ 2Bernoulli(0.5 )). 
Multiply the summands of the likelihood function L k (d) with the new random variables: 

L°k(0) = Y, n (2- 2 ) 

Z - 'l=l 

then it holds ]E°L°(0) = Lk(0 ), where JE° stands for the conditional expectation given 

Y . 

Therefore, the quasi MLE for the Y -world is a target parameter for the bootstrap 
world for each k = 1 ,,K: 

argmax 0e(9fc 1E°L° k (0) = arginax 0g0fc L k (6) = 0 k . 

The corresponding bootstrap MLE is: 

0° d = argmax 0e6 > fe L° k (0). 
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-—- o -—- 

The k-th likelihood ratio statistic in the bootstrap world equals to L k (G k ) — L k (G k ), 
where all the elements: the function L°(G) and the arguments G k , 6 k are known and 
available for computation. This means, that given the data T , one can estimate the 

- Q -—' 

distribution or quantiles of the statistic L° ( G k )—L k (G k ) by generating many independent 
samples of the bootstrap weights ui,...,u n and computing with them the bootstrap 
likelihood ratio. 

Let us introduce similarly to (1.5) the (1 — a) -quantile for the bootstrap square-root 
likelihood ratio statistic: 

= inf {3 > 0 :P° (L° k {G° k ) - L° k (G k ) > f/l) < «} , (2.3) 

here 1P° denotes probability measure conditional on the data Y , therefore, 3 ° (a) is a 
random value dependent on Y . 

Spokoiny and Zhilova (2014) considered the case of a single parametric model (K = 
1 ), and showed that the bootstrap quantile 3 ° (a) is close to the true one ik{ot) under 
a so called “Small Modeling Bias” (SmB) condition, which is fulfilled when the true 
distribution is close to the parametric family or when the observations are i.i.d. When the 
SmB condition does not hold, the bootstrap quantile is still valid, however, it becomes 
conservative. Therefore, for each fixed k = 1 the bootstrap quantiles 3 °(a) 

are rather good estimates for the true unknown ones 3 k (a ), however, they are still 
“pointwise” in k , i.e. the confidence bounds (1.6) hold for each k separately. Our 
goal here is to estimate 31 (a),... , 3 #(a) and c(a) according to (1.7) and (1.8). Let us 
introduce the bootstrap correction for multiplicity: 

c» ^ sup jce(0, a] :JP° ((jf = i{ ~ 2L ° k (0 k ) > }° k (c)}) < a| . (2.4) 

By its definition c° (a) depends on the random sample Y . 

The multiplier bootstrap procedure below explains how to estimate the bootstrap 
quantile functions 3 ° (c°(a)) corrected for multiplicity. 


The simultaneous bootstrap procedure: 

Input: The data Y (as in (1.1)) and a fixed confidence level (1 — a) E (0,1). 

Step 1 : Generate B independent samples of i.i.d. bootstrap weights {u^\ ..., u$} , 
b = 1,..., B . For the bootstrap likelihood processes 

Ll {b \e) = Y, n i=1 kk{e)uf\ ( 2 . 5 ) 

compute the bootstrap likelihood ratios L°^(0°/^) — L° k b \G k ) . For each 
fixed b the bootstrap likelihoods L^ b \&), ..., L^ b \G) are computed using 
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the same bootstrap sample [u:^ }, s.t. the i -th summand £{^(0) is always 
multiplied with the i - th weight 'up as in (2.5). 

Step 2 : Estimate the marginal quantile functions 3 ° (a) defined in (2.3) separately 

—O ~ 

for each k = 1,..., K , using B bootstrap realisations of L°(0 k ) — L°(0 k ) 
from Step 1. 

Step 3: Find by an iterative procedure the maximum value c E (0,aj s.t. 

p° (u w}) <a. 

Otput: The resulting critical values are 3 ° (c), k = 1,..., K . 


Remark 2.1. The requirement in Step 1 to use the same bootstrap sample {u ^} for 
generation of the bootstrap likelihood ratios L k (d k )—L k ( ' b \d k ), k = 1,..., K allows 
to preserve the correlation structure between the ratios and, therefore, to make a sharper 
simultaneous adjustment in Step 3. 

This procedure is justified theoretically in the next section. 


3 Theoretical justification of the bootstrap procedure 

Before stating the main results in Section 3.2 we introduce in Section 3.1 the basic 
ingredients of the proofs. The general scheme of the theoretical approach here is taken 
from Spokoiny and Zhilova (2014). In the present work we extend that approach for the 
case of simultaneously many parametric models. 


3.1 Overview of the theoretical approach 

For justification of the described multiplier bootstrap procedure for simultaneous infer¬ 
ence it has to be checked that the joint distributions of the sets of likelihood ratio statis¬ 
tics | L k {6 k ) — L k (6* k ) : k = 1,..., A"| and j L°(6 k ) — L°(6 k ) : k = 1,... ,K j are close 
to each other. These joint distributions are approximated using several non-asymptotic 
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steps given in the following scheme: 


l^-world: \j2L k (6 k ) - 2L k (6* k ) 


uniform 
sq-Wilks 
theorem 


p k +l°g K 


joint Gauss, 
approx. Sz 
anti-concentr. : 


n 

l<k<K 


U w 


simultaneous 
Gauss, compar.** 


(3.1) 


Bootstrap 

world: 


s j2L° k (e° k )-2L° k (d k ) 


Pk+ l °S K 

y/n 





* 


the accuracy of these approximating steps is 


| Pm^L i og 9 (X) j og 3 ( n p sum ) | 


1/8 


** Gaussian comparison step yields an approximation error proportional to 

#Lb J Pmax log 2 (K) log 3/4 (np sum ), where 5 S mb comes from condition (SmB) , 

see also (3.4) below. 


Here £ k and denote normalized score vectors for the Y and bootstrap likelihood 
processes: 


Zk = D^V 0 L k (0%), C. = C(0*k) = D^V e L k {0%), (3.2) 

D k is the full Fisher information matrix for the corresponding k -th likelihood: 

D\ d ^ f —VglEL k (6 k ). 

£ k ~ AA(0,Var£ fc ) and £ k ~ /V(0, Var°£°) denote approximating Gaussian vectors, 
which have the same covariance matrices as £ and . Moreover the vectors .... 

/—°T -otG 

and I £ K I are normally distributed and have the same covariance matrices 

as the vectors (4^, ■ ■ ■, 4 a') T and (£°J,..., £°J-) T correspondingly. Var° and Cov° 
denote variance and covariance operators w.r.t. the probability measure 1P° conditional 
on Y. 

The first two approximating steps: square root Wilks and Gaussian approximations 
are performed in parallel for both Y and bootstrap worlds, which is shown in the cor¬ 
responding lines of the scheme (3.1). The two worlds are connected in the last step: 
Gaussian comparison for I 2 -norms of Gaussian vectors. All the approximations are 
performed simultaneously for K parametric models. 

Let us consider each step in more details. Non-asymptotic square-root Wilks approx¬ 
imation result had been obtained recently by Spokoiny (2012a, 2013). It says that for 
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a fixed sample size and misspecified parametric assumption: P ^ {Pk} , it holds with 
exponentially high probablity: 

^2{L k (e k )-L k (ei)}-u k 

here the index k is fixed, i.e. this statement is for one parametric model. The precise 
statement of this result is given in Section B.l, and its simultaneous version - in Sec¬ 
tion B.3. The approximating value ||£ fe || is £2 -norm of the score vector given in 
(3.2). The next approximating step is between the joint distributions of ||^ 1 ||,..., ||£ A -|| 
and ..., \\£ k \\ . This is done in Section A.l for general centered random vectors 

under bounded exponential moments assumptions. The main tools for the simultaneous 
Gaussian approximation are: Lindeberg’s telescopic sum, smooth maximum function and 
three times differentiable approximation of the indicator function H{x E M : x > 0} . 
The simultaneous anti-concentration inequality for the £2 -norms of Gaussian vectors is 
obtained in Section A.3. The result is based on approximation of the £2 -norm with a 
maximum over a finite grid on a hypersphere, and on the anti-concentration inequality 
for maxima of a Gaussian random vector by Chernozhukov et al. (2014c). The same 
approximating steps are performed for the bootstrap world, the square-root bootstrap 
Wilks approximation is given in Sections B.2, B.3. The last step in the scheme (3.1) 
is comparison of the joint distributions of the sets of £2 -norms of Gaussian vectors: 

— — —O —O 

ll£iII) • • • 1 \\£k\\ an d ||£i ||, ■ ■ ■, II^a'II by Slepian interpolation (see Section A.2 for the 
result in a general setting). The error of approximation is proportional to 


. , Pk 

s At ’ w ~ 7r 


i S ™g SK ll 0oT &i.4fa)- CoT °(C^b)ll m „- ( 3 - 3 ) 

It is shown, using Bernstein matrix inequality (Sections C.l and C.3), that the value (3.3) 
is bounded from above (up to a constant) on a random set of dominating probability with 


max 
1 <k<K 



BlH , 1 1 


<? mb 


(3.4) 


for 

B\ = l !E{Vo£ i , k (Ol)}lE{Vo£ i , k (ei)} T , 

2 defv^ f Tl M 

Hi = Y l=1 E Ye£i,k(ei)Vokk(01) T } • 

The value \\H^ l BlH^ 1 1| is responsible for the modelling bias of the fc-th model. If 
the parametric family {]P k (9)} contains the true distribution P or if the observations 
Yi are i.i.d., then Bj, equals to zero. Condition (SmB) assumes that all the values 
11 1 B'l 1 11 are rather small. 
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3.2 Main results 


The following theorem shows the closeness of the joint cumulative distribution functions 
(c.d.f-s.) of y2L k (d k )-2L k (0* k ),k = 1 and { ^2L°(0°) - 2L° k (0 k ), k = 

1,..., K | . The approximating error term Zitotal equals to a sum of the errors from 
all the steps in the scheme (3.1). 


Theorem 3.1. Under the conditions of Section 5 it holds with probability > 1 — 12e x 
for z k > C^Jpf, 1 < C < 2 


IP 


\J^ =i {y/2L k (0 k )-2L k (0%) > z k } 


-1P° 


UL{\/ 21 h»h-2i;(e*) > **}) 


5: ^total- 


The approximating total error Z\ tota i > 0 is deterministic and in the case of i.i.d. obser¬ 
vations (see Section 5.3) it holds: 

/ 3 \ 1/8 

Aotai < C log 9 / 8 (A')log 3 / 8 (np sum ) {(a 2 + a|) (1 + <5 2 (x)) } 3/S , (3.6) 

where the deterministic terms ci 2 ,a^ and 5~(x) come from the conditions (X), ( Tb ) 
and (SDi). Aotai is defined in (C.5). 


Remark 3.1. The obtained approximation bound is mainly of theoretical interest, al¬ 
though it shows the impact of p max , K and n on the quality of the bootstrap procedure. 
For more details on the error term see Remark A.l. 

The next theorem justifies the bootstrap procedure under the (SmB) condition. The 
theorem says that the bootstrap quantile functions 3 °(-) with the bootstrap-corrected for 
multiplicity confidence levels 1 — c° (a) can be used for construction of the simultaneous 
confidence set in the Y -world. 


Theorem 3.2 (Bootstrap validity for a small modeling bias). Assume the conditions 
of Theorem 3.1, and c(a), 0.5c° (a) > Auii,max, then for a < 1 — 8 e _x it holds with 
probability 1 — 12 e _x 

IP (u k =1 {\l 2L k(0k)-2L k {0%) > 3° (c° (a) - 2A fu ll,ma*)}) - « < A, total, 

IP (u^ilv^j ~ — 3° ( C °( Q; ) + 2Auii,max)}) - a > - A, total, 

where Auii,max < C{(Pmax + x) 3 /?^} 1 / 8 in the case of i.i.d. observations (see Section 
5.3), and A, total < 3Aotai / their explicit definitions are given in (C.ll) and (C.14). 
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Moreover 


c (a) < c (a + A c ) + zAf u ii , max? 

C (®) ^ C (d 4\ c ) -^ful^max) 

for 0 < A c < 2Z\ tota i, defined in (C.15). 

The following theorem does not assume the (SmB) condition to be fulfilled. It turns 
out that in this case the bootstrap procedure becomes conservative, and the bootstrap 
critical values corrected for the multiplicity 3 ° (c°(d)) are increased with the modelling 
bias yj tr {D'jf 1 H^D^ 1 } — y4r {D k 1 (H k — , therefore, the confidence set based 

on the bootstrap estimates can be conservative. 

Theorem 3.3 (Bootstrap conservativeness for a large modeling bias). Under the con¬ 
ditions of Section 5 except for (SmB) it holds with probability > 1 — 14e x for z k > 

Cy/Pi, 1 < C < 2 

IP (u" , { \l‘ZL k {d k ) — 2L k (0fi) > 

< P° (uti {\l 2L °k{°l)-2Li{e k )> + A, itotal . 

The deterministic value A bi total £ [0, zA to tai] ( see (3-6) in the case 5.3). Moreover, the 
bootstrap-corrected for multiplicity confidence level 1 — c° ( a ) is conservative in compar¬ 
ison with the true corrected confidence level: 

1 C (d) 1 C (d + Z\b t c) ^full,max, 

and it holds for all k = 1,..., K and a < 1 — 8 e -x 

3 ° (c° (a)) > 3 k (c (a + A bjc ) + A iull , max) 

+ - fir{Dp(Hl - Bf)D- k ‘} - Ax,!,*. 

for 0 < Z\b, c < 2Zl to tai ; defined in (C.18), and the positive value A qf: i jk is bounded from 
above with (a| + a 2 B k )(y/8xp k + 6 x) for the constants a| > 0 , a 2 B k > 0 from conditions 

(Z) , <?b) ■ 

The (SmB) condition is automatically fulfilled if all the parametric models are 
correct or in the case of i.i.d. observations. This condition is checked for generalised 
linear model and linear quantile regression in Spokoiny and Zhilova (2014) (the version 
of 2015). 
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4 Numerical experiments 

Here we check the performance of the bootstrap procedure by constructing simultaneous 
confidence sets based on the local constant and local quadratic estimates, the former 
one is also known as Nadaraya-Watson estimate Nadaraya (1964); Watson (1964). Let 
Yi,... ,Y n be independent random scalar observations and X\,.... X n some determin¬ 
istic design points. In Sections 4.1-4.3 below we introduce the models and the data, 
Sections 4.4-4.6 present the results of the experiments. 


4.1 Local constant regression 


Consider the following quadratic likelihood function reweighted with the kernel functions 

K(-): 

L(0,x,h) = - 0?wi{x,h), 

Wi(x, h) = f K({x - Xi}/h), 

K(x) € [0,1], f K(x)dx = 1, K(x) = K(—x). 

Jm 

Here h > 0 denotes bandwidth, the local smoothing parameter. The target point and 
the local MLE read as: 


d*(x, h ) 


def J2i=l w i( x ih)]EYi 

E?=i m(x,h) 


0(x. h ) 


def Ya= 1 Wj(x, h)Yj 

E”=i V)i(x, h) 


Let us fix a bandwidth h and consider the range of points x\,, xk ■ They yield K 

def 

local constant models with the target parameters 6* k — 6* (x^, h) and the likelihood 
functions Lk{6) = f L(0, Xk, h) for k = 1,..., K . 

The bootstrap local likelihood function is defined similarly to the global one (2.2), by 
reweighting L(6 , x, h ) with the bootstrap multipliers u \,..., u n : 


L° k (e ) d ^ f L°(d,x k , h) d ^ f V" (Yi - 0) 2 Wi(x k ,h)ui, 

Z z — J i=l 


e° k d ^G°(x k ,h) = 


def EEl Wj(x k , h)UjYi 
Ei=iM x k,h) Ui 


4.2 Local quadratic regression 

Here the local likelihood function reads as 


L(0, x , h) d = V" (Yi - e) 2 Wi(x , h), 

Z z — J i=l 


d^iGM 3 , ^ d =( l,Xi,Xf)' , 
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and 

0*0,/t) d = (&W(x,h)& T> ) 1 &W(x,h)EY, 
00) h) d = f (]pw(x, /z)^ T ) * <ZW(x,/i)l% 

where 

1* d = (l*i,..., Y n ) T , V d = Oi,..., <F n ) € lR 3xn , 
W 0, /i) = diag (iuiO) h),..., w n (x, h)} . 

And similarly for the bootstrap objects 

L°{0,x,h) d = -^^2 i=i (Yi-^j6) 2 Wi(x,h)ui, 

0°O ,h) d = (&UW(x,h)& T ^) 1 'PUW(x, h)Y, 

for U = f diag {m ,..., u n } . 


4.3 Simulated data 

In the numerical experiments we constructed two 90% simultaneous confidence bands: 
using Monte Carlo (MC) samples and bootstrap procedure with Gaussian weights (Ui ~ 
AA(1,1)), in each case we used 10 4 {1J} and 10 4 {ui} independent samples. The 
sample size n = 400 . K(x) is Epanechnikov’s kernel function. The independent random 
observations Tj are generated as follows: 


Yj = f{Xi) + A7(0,1), Xj are equidistant on [0,1], 


/(*) 


5, x G [0, 0.25] U [0.65,1]; 

< 5 + 3.8{1 - 100(s - 0.35) 2 }, x G [0.25, 0.45]; 

5 - 3.8{1 - 100(s - 0.55) 2 }, x G [0.45, 0.65], 


(4.1) 


(4.2) 


The number of local models K = 71, the points x\,... ,xji are equidistant on [0,1] . 
For the bandwidth we considered two cases: h = 0.12 and h = 0.3 . 


4.4 Effect of the modeling bias on a width of a bootstrap confidence 
band 

The function f(x) defined in (4.2) should yield a considerable modeling bias for both 
mean constant and mean quadratic estimators. Figures 4.1, 4.2 demonstrate that the 
bootstrap confidence bands become conservative (i.e. wider than the MC confidence 
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band) when the local model is misspecified. The top graphs on Figures 4.1, 4.2 show 
the 90% confidence bands, the middle graphs show their width, and the bottom graphs 
show the value of the modelling bias for K = 71 local models (see formulas (4.3) and 
(4.4) below). For the local constant estimate (Figure 4.1) the width of the bootstrap 
confidence sets is considerably increased by the modeling bias when x £ [0.25,0.65] . 
In this case case the expression for the modeling bias term for the k -th model (see 
also (SmB) condition) reads as: 


I H- 1 B 2 H~ 1 \ = ELi {E Y i - 0*{x k )} 2 w 2 (x k ,h) 
kk TJl=iE{Y i -e*{x k )} 2 w 2 i {x k ,h) 


= 1-1 + 


E U«t(*k,h){f{x i )-(r(x k )} i 




EEi «t(x k ,h) J 

And for the local quadratic estimate it holds: 

IK'BhU 1 II = ||u - Hi 1 {EL ^ A». 2 (w, h )} Hp 

where I p is the identity matrix of dimension p x p (here p = 3), and 

H l = IT , w i( x k, h)E{Yi ~ £>*(x k )} 2 
/ -*1=1 

= Y" , &&iV%{x k , h) { f(Xi ) - 6*{x k )} 2 + Y' 1 , ^7wK x k, h )• 

^ “*1=1 < l— 1 


(4.3) 


(4.4) 


(4.5) 


Therefore, if nraxi<fc<^' {/(Aj) — 6*(x k )} 2 = 0 , then \\H^ 1 B 2 H ^ 1 1 = 0 . On the Figure 
4.1 both the modelling bias and the difference between the widths of the bootstrap and 
MC confidence bands are close to zero in the regions where the true function f(x) is 
constant. On Figure 4.2 the modelling bias for h = 0.12 is overall smaller than the 
corresponding value on Figure 4.1. For the bigger bandwidth h = 0.3 the modelling 
biases on Figures 4.1 and 4.2 are comparable with each other. 

Thus the numerical experiment is consistent with the theoretical results from Sec¬ 
tion 3.2, and confirm that in the case when a (local) parametric model is close to the 
true distribution the simultaneous bootstrap confidence set is valid. Otherwise the boot¬ 
strap procedure is conservative: the modelling bias widens the simultaneous bootstrap 
confidence set. 


4.5 Effective coverage probability (local constant estimate) 

In this part of the experiment we check the bootstrap validity by computing the effective 
coverage probability values. This requires to perform many independent experiments: 
for each of independent 5000 {E} ~ (4.1) samples we took 10 4 independent bootstrap 
samples {rq} ~ j\/"(l, 1) , and constructed simultaneous bootstrap confidence sets for a 
range of confidence levels. The second row of Table 4.1 contains this range (1 — a) = 
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Figure 4.1: Local constant regression: 

Confidence bands, their widths, and the modeling bias 


bandwidth = 0.12 bandwidth = 0.3 



Legend for the top graphs: 

90% bootstrap simultaneous confidence band - the true function f(x) 

90% MC simultaneous confidence band - local constant MLE 

smoothed target function 

Legend for the middle and the bottom graphs: 
width of the 90% bootstrap confidence bands from the upper graphs 
width of the 90% MC confidence bands from the upper graphs 
modeling bias from the expression (4.3) 
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Figure 4.2: Local quadratic regression: 

Confidence bands, their widths, and the modeling bias 


bandwidth = 0.12 


bandwidth = 0.3 





Legend for the top graphs: 


90% bootstrap simultaneous confidence band - the true function f(x) 

90% MC simultaneous confidence band - local constant MLE 

smoothed target function 


Legend for the middle and the bottom graphs: 
width of the 90% bootstrap confidence bands from the upper graphs 
width of the 90% MC confidence bands from the upper graphs 
modeling bias from the expression (4.4) 
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0.95, 0.9,..., 0.5 . The third and the fourth rows of Table 4.1 show the frequencies of the 
event 

1<1<A- { Lfc ^ ~ L k( d *k) ~ 3°( c °(a))} < 0 

among 5000 data samples, for the bandwidths h = 0.12, 0.3 , and for the range of (1 —a). 
The results show that the bootstrap procedure is rather conservative for both h = 0.12 
and h = 0.3, however, the larger bandwidth yields bigger coverage probabilities. 


Table 1: Effective coverage probabilities for the local constant regression 



Confidence levels 

h 

0.95 

0.90 

0.85 

0.80 

0.75 

0.70 

0.65 

0.60 

0.55 

0.50 

0.12 

0.971 

0.947 

0.917 

0.888 

0.863 

0.830 

0.800 

0.769 

0.738 

0.702 

0.3 

0.982 

0.963 

0.942 

0.918 

0.895 

0.868 

0.842 

0.815 

0.784 

0.750 


4.6 Correction for multiplicity 

Here we compare the Y and the bootstrap corrections for multiplicity, i.e. the values 
c(a) and c°(a) defined in (1.8) and (2.4). The numerical results in Tables 2, 3 are 
based on 10 4 {!)} ~ (4.1) independent samples and 10 4 independent bootstrap sam¬ 
ples {ui} ~ jV(l, 1). The second line in Tables 2, 3 contains the range of the nominal 
confidence levels (1 — a) = 0.95,0.9,..., 0.5 (similarly to the Table 1). The first col¬ 
umn contains the values of the bandwidth h = 0.12,0.3, and the second column - the 
resampling scheme: Monte Carlo (MC) or bootstrap (B). The Monte Carlo experiment 
yields the corrected confidence levels 1 — c(a), and the bootstrap yields 1 — c°(a). The 
lines 3-6 contain the average values of 1 — c(a) and 1 — c°(a) over all the experiments. 
The results show that for the smaller bandwidth both the MC and bootstrap corrections 
are bigger than the ones for the larger bandwidth. In the case of a smaller bandwidth 
the local models have less intersections with each other, and hence, the corrections for 
multiplicity are closer to the Bonferroni’s bound. 

Remark 4.1. The theoretical results of this paper can be extended to the case when a set 
of considered local models has cardinality of the continuum, and the confidence bands 
are uniform w.r.t. the local parameter. This extension would require some uniform 
statements such as locally uniform square-root Wilks approximation (see e.g. Spokoiny 
and Zhilova (2013)). 

Remark 4.2. The use of the bootstrap procedure in the problem of choosing an optimal 
bandwidth is considered in Spokoiny and Willrich (2015). 
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Table 2: Local constant regression: 

MC vs Bootstrap confidence levels corrected for multiplicity 



Confidence levels 

h 

r.m. 

0.95 

0.90 

0.85 

0.80 

0.75 

0.70 

0.65 

0.60 

0.55 

0.50 

0.12 

MC 

0.997 

0.994 

0.989 

0.985 

0.980 

0.975 

0.969 

0.963 

0.956 

0.949 

B 

0.998 

0.995 

0.991 

0.988 

0.984 

0.979 

0.975 

0.969 

0.963 

0.957 

0.3 

MC 

0.993 

0.983 

0.973 

0.962 

0.949 

0.936 

0.922 

0.906 

0.891 

0.873 

B 

0.994 

0.986 

0.977 

0.968 

0.958 

0.947 

0.935 

0.922 

0.908 

0.893 


Table 3: Local quadratic regression: 

MC vs Bootstrap confidence levels corrected for multiplicity 



Confidence levels 

h 

r.m. 

0.95 

0.90 

0.85 

0.80 

0.75 

0.70 

0.65 

0.60 

0.55 

0.50 

0.12 

MC 

0.997 

0.993 

0.989 

0.985 

0.979 

0.974 

0.968 

0.961 

0.954 

0.946 

B 

0.998 

0.995 

0.991 

0.988 

0.984 

0.979 

0.974 

0.969 

0.963 

0.956 

0.3 

MC 

0.993 

0.983 

0.973 

0.961 

0.949 

0.936 

0.921 

0.904 

0.887 

0.868 

B 

0.996 

0.991 

0.985 

0.978 

0.971 

0.963 

0.954 

0.944 

0.934 

0.923 


5 Conditions 

Here we show necessary conditions for the main results. The conditions in Section 5.1 
come from the general finite sample theory by Spokoiny (2012a), they are required for 
the results of Sections B.l and B.2. The conditions in Section 5.2 are necessary to prove 
the statements on multiplier bootstrap validity. 

5.1 Basic conditions 

def 

Introduce the stochastic part of the fc-th likelihood process: Ck(9) = L k {0) — ]EL k (6 ), 
and its marginal summand: Q,k(9) = f £i,k{9) ~ hElij,(G) for £i t k(0) defined in (2.1). 

(ED 0 ) For each k = 1 ,K there exist a positive-definite p k x p k symmetric matrix 
and constants g/ i; > 0 ,V}. > 1 such that Var (VeOd^fc)} — ^k an( ^ 

sup log IE exp j A 7 ^ ,( f k | < u|A 2 /2, |A| < g k . 

-f£MPk { ||V / fc7|| J 


(ED 2 ) For each k = 1 ,,K there exist a constant ui k > 0 and for each r > 0 a 
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constant g 2 ,fc(r) such that it holds for all 6 E 0o,fc( r ) and for j = 1,2 

sup logiEexp {—■ ijDjf 1 V 2 g ( k (G)Df; 1 'y 2 X < n|A 2 /2, |A| < g 2 , fc (r). 

7,eJR p fe L J 

Il7/I<i 

(22o) For each k = 1 ,,K and for each r > 0 there exists a constant Sk( r) > 0 such 
that for r < ^ ( ryj. come from condition (B.l) of Theorem B.l in Section B.l) 

6(r) < 1/2 , and for all 6 E 0o,fc( r ) it holds 

\\D^Dl(9)D^-I Pk \\<S k (T), 

where D 2 (6 ) d = - \7 2 EL k {6 ) and 0 o , fc (r) = f {0 G 0 fc : ||£> fc (0 - 0* k )\\ < r} . 

(20 There exist constants a& > 0 /or all k = 1,..., K s.t. 

4 D l > vl 

Denote a 2 = vn&x.\<k<K a k . 

(£r) For each k = 1,,K and r > ro,fc there exists a value bfc(r) >0 s.t. 
rbfc(r) —>• oo for r —> oo and MO E 0 k '■ \\D k (9 — 0£)|| = r it holds 

-2{dEL k {6 ) - ®L fc (0* fc )} > r 2 b fc (r). 

5.2 Conditions required for the bootstrap validity 

(SmB) There exists a constant <5 smb > 0 such that it holds for the matrices B 2 and 
H 2 defined in (3.5): 

/ \ 1 / 8 

? mb < c ( -ry- ) log- 7/8 (it') log“ 3/8 (np sum ). 

VP max / 

(ED 2 m) For each k = 1,..., K, r > 0 , i = 1,..., n, j = 1,2 and for all 6 E 0o,fc( r ) 
it holds for the values u k > 0 and g 2 ,fc(r) from the condition (ED 2 ) : 

sup logtEexp/ — F> k 1 V 2 e Ci,k(0)D k 1 'y 2 \ < |A| < g 2 , fc (r), 

7,eiR Pfe l J 

(22om) For each k = 1,..., K , r > 0, i = 1,..., n and for all 6 E 0o,fc(r) there exists 

a value C mi fc(r) > 0 such that 


D k l V 2 e JEl ijk {e)D k l || < C m , fc (r)n 1 . 
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(Zb) For each k = 1,..., K there exists a constant a B k > 0 s.t. 

a B,kDk ^ Bl. 

Denote a B = f maxi<fc<B a B k . 

(SDi) There exists a constant 0 < 6 2 * < C p sum /n such that it holds for all i = 1,.. 


., n 


with exponentially high probability 


H-Ug^J-IE 


9idJ 


}£- 1 ||<&, 


where 


9, = (v s Ai(«;) T , ■ ■ ■, 6 

def I I 

Psum = Pi 4- fPR- 

(£b) T/ie i.i.d. bootstrap weights Ui are independent of Y , and for all i = 1,... ,n it 
holds for some constants g/,. > 0, v k >l 

lEui = 1, Var m = 1, 

log IE exp { A (iq - 1)} < Vq\ 2 /2, |A| < g. 


5.3 Dependence of the involved terms on the sample size and cardinal¬ 
ity of the parameters’ set 


Here we consider the case of the i.i.d. observations Y±,... ,Y n and x = Clogn in order 
to specify the dependence of the non-asymptotic bounds on n and p. In the paper by 
Spokoiny and Zhilova (2014) (the version of 2015) this is done in detail for the i.i.d. case, 
generalized linear model and quantile regression. 

Example 5.1 in Spokoiny (2012a) demonstrates that in this situation g k = C sfn and 
Uk = C /y/n. then 3fc( x ) = C \]pk + x for some constant C > 1.85 , for the function 3fc(x) 
given in (B.3) in Section B.l. Similarly it can be checked that g 2 i / c (r) from condition 
(ED 2 ) is proportional to ^Jn : due to independence of the observations 


log JE exp \ - 7 7 D k x VoCk{Q)D k S 2 




E n 

. log IE exp 
1=1 


A 1 


7i’d fc lv iCi,fc(0)dfc 1 72 


< n—C for |A| < g 2 , fe (r )y/n, 
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where ( ijfc (0) = f 4,fe( 0 ) “ ^i,k(9), d\ d = - \/ 2 e ]E£ i:k (e * k ) and D\ = nd 2 k in the i.i.d. 
case. Function g 2 fc (r) denotes the marginal analog of g 2 .fc(r). 

Let us show, that for the value 5fc( r ) from the condition (£o) it holds ^(r) = 
Cr/y/n. Suppose for all 6 e 6 >o,fc( r ) and 7 6 JR Pk : || 7 || = 1 \\D k 1 'y T \/glELk{6)D k 1 1| < 
C, then it holds for some 9 £ 6 >o,fc( r ): 

\\Dk 1 D\9)D- 1 - I Pk \| = - 0) T V|iEL fe (0) J D- 1 || 

= \\D k \9l - 6) T DkD k 1 ’Vg]EL k (9)D k 1 \\ 

< r||i?- 1 ||||Z)- 1 7 T V|iFL fc ( 0 )D- 1 || < Cr/^- 

Similarly < Cr /y/n + C in condition (£om) . 

The next remark helps to check the global identifiability condition (£r) in many 
situations. Suppose that the parameter domain 0 k is compact and n is sufficiently 
large, then the value bfc(r) from condition (£r) can be taken as C{ 1 — r /y/n} ~ C. 
Indeed, for 6 : \\Df.{9 — 9 * k )|| = r 

-2 {]EL k {0) - JEL k {0t)} > r 2 {l-r||D- 1 |||| J D- 1 7 T V^L fc ( 0 )L>- 1 ||} 

> r 2 (l - Cr/Vn). 

Due to the obtained orders, the conditions (B.l) and (B.9) of Theorems B.l and B.5 on 

— —O _ 

concentration of the MLEs 6 k , G k require r o k > C^p k + x . 

A Approximation of the joint distributions of £2 -norms 

Let us previously introduce some notations: 

1 K = f ( 1 ,..., 1 ) T € M K ; 

|| ■ || is the Euclidean norm for a vector and spectral norm for a matrix; 

II • Umax is the maximum of absolute values of elements of a vector or of a matrix; 

|| • ||i is the sum of absolute values of elements of a vector or of a matrix. 

Consider K random centered vectors <j) k £ IR Pk for k = 1 Each vector 

equals to a sum of n centered independent vectors: 

4>k = 4>k,l V-L 4>k,ni 

JEcf) k = dE(j) k i = 0 VI < i < n. 

Introduce similarly the vectors 1/7 £ JR Pk for k = 1,..., K: 

i’k = tfk, 1 V + 
dE'il’k = lEtp k i = 0 V 1 < i < n, 


(A.l) 


(A.2) 
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with the same independence properties as <fi ki , and also independent of all fi> ki . 

The goal of this section is to compare the joint distributions of the £2 -norms of the 
sets of vectors fi> k and tf k , k = 1 ,... ,K (i.e. the probability laws L (\\4>i ||,..., ||</>aII) 
and L (||’j/>i||,..., H^atII) )> assuming that their correlation structures are close to each 
other. 

Denote 


def def 

Praax = max p k , p sum = P\ ~\ - \~p K , 

1 <k<K 

X l,max = f max II Var(0-)||, A^ d = max || Var(t/> -)||, 

1 <k<K J T ’ l<k<I\ J 

def def 

^max = max z k , z min = mm z k , 
l<k<K \<k<K 

j- def „ r def . r 

Oz ,max — max o Zk , o z min — min 0 Zk , 
l<k<K * ’ l<k<K 


let also 


Ar = 


n 


1/8 


log 9 / 16 (K)log 3 / 8 (np sum )4 8 n 


x max{A 0 jmax , A^, max } 3/4 log 1 / 8 (5n 1/2 ). 


(A.3) 


The following conditions are necessary for the Proposition A.l 
(Cl) For some g k , v k , c^, > 0 and for all i = 1,... ,n, k = 1,..., K 

sup logJEexp < cf ki /c ( j ) \ < \ 2 u k /2, |A| < g fc , 

7 fc SlR Pfc , ' 

llTfell=l 

sup log]Eexp\X^fi^tl) ki /cA < A 2 uf/2, |A| < g fc , 

7 fc eiR p fc, t ' J 

ll7fclM 

where C0 ^ CA0 ?max CLTld ^ CA^max • 

(C2) For some 5%, > 0 


i< ™M<K ll Cov (^M ^fc 2 ) - Cov(il> kl ,il>k 2 )|| max < 6l. 


(A.4) 


Proposition A.l (Approximation of the joint distributions of I 2 -norms). Consider the 
centered random vectors <fi 1 ,..., <j> K an d ^ii ■ ■ ■ i^k given in (A.l), (A.2). Let the 
conditions (Cl) and (C2) be fulfilled, and the values z k > y/p k + A e and 6 Zk > 0 be s.t. 
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Cmaxjn 1 ^ 2 ,5 Z ,max} < A e < Cz m l ax , then it holds with dominating probability 

r (u« { M > ^>) - p (uti >Zk ~ ^>) ^ 

jp (iXi-tM > ^>) - p (uL {ii^ii > ^+<u) < ^ 

for the deterministic non-negative value 
/ 3 \ !/8 

A h < 12.5C rJ log 9/8 (Ar) log 3/8 (np sum ) max {A^ max , A,y max } 3/4 

/ 3 \!/ 4 

+ 3.2C5% ( J Pmax^min lo g 2 (*0 log 3/4 (np sum ) max {A 0 , max , A,/, !max } 7/2 

/„3 \ 1/8 

< 25C log 9/8 (A') log 3/8 (np sum ) max {A^ max , A,/, !max } 3/4 , 


where the last inequality holds for 


S 2 E < 4C 


n 


pl3 

rmax 


1/8 


log ,/8 (A)log 3/8 (np sum ) (max {A^ max , A^, max }) 11/4 . 


Remark A.l. The approximating error term A^ 2 consists of three errors, which cor¬ 
respond to: the Gaussian approximation result (Lemma A.2), Gaussian comparison 
(Lemma A.7), and anti-concentration inequality (Lemma A.8). The bound on A^ 2 
above implies that the number K of the random vectors cf>i,..., <fi K should satisfy 
log A' < ( n /.Pmax) 1/12 in order to keep the approximating error term A^ 2 small. This 
condition can be relaxed by using a sharper Gaussian approximation result. For instance, 
using in Lemma A.2 the Slepian-Stein technique plus induction argument from the recent 

paper by Chernozhukov et al. (2014b) instead of the Lindeberg’s approach, would lead 

„3 \ 1/6 


to the improved bound: C 


multiplied by a logarithmic term. 


A.l Joint Gaussian approximation of t? 2 _norm of sums of independent 
vectors by Lindeberg’s method 

Introduce the following random vectors from M Psum : 



Ed> = E<Pi = 0 . 


(A.5) 
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Define their Gaussian analogs as follows: 



<Pi ~ A7(0, Var<Pj), <P ~ A/"(0, Var L>), 


0*,i ~^'(O,Var0 fcii ) J 


C = X] • 1 ~ Var 0 fc)- 

Z -^ 2=1 ’ 


(A.6) 

(A.7) 

(A.8) 


Lemma A.2 (Joint GAR with equal covariance matrices). Consider the sets of ran¬ 
dom vectors (f>j and <f> 3 . j = 1 defined in (A.l), and (A.5)- (A. 8 ). If the 

conditions of Lemmas A.4 are A.5 are fulfilled, then it holds for all A, (3 > 0 , zj > 
max {A + y/pj, 2.25 log {K)f fi} with dominating probability 

p (U'l, {Ill'll > A) £ r (U'l, {ft-ll >z s -A- ^^}) + MAffl, 

p (U^dl^ll > A) > p (U'l, {ll?JI > *> + 4 + ^r 1 }) - 

3 1 /2 

for 6 3 ^(A,/3) < C (35 + Jj + |%^log(A')log 3 (np sum )| given in (A.15). 

Proof of Lemma A.2. 

F (U"i {Ill’ll > = E H( m axi<j<x {||0j|| 2 - Z]} > 0). 

Let us approximate the maxi<j<^ function using the smooth maximum: 

hfj ({a;j}) = f fi~ l log (^2 f - =l e ^ Xj ^j for /3 > 0 , xj e M, 

hp ({xj}) - /3 _1 log (K) < max{xj} < hg ({x^}). (A.9) 


The indicator function H{x > 0} is approximated with the three times differentiable 
function g(x) growing monotonously from 0 to 1 : 


, N def 

9(x) = 


0 , 


x < 0 , 


16x 3 /3, xe [0,1/4], 

0.5 + 2(x - 0.5) - 16(x - 0.5) 3 /3, x <E [1/4, 3/4], 

1 + 16(x — l) 3 /3, x € [3/4,1], 

1 , x > 1 . 


It holds for all x E IR and A > 0 


H{x > A} < g(x/A) < H{x/A > 0} . 
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Therefore 


P (i“{I^|- z j} >A ) 


< IE It [ max 
\ i<j<K 


2 z* 


> A 


/ \WjW-Zj 

< lEg max < -- —— 

' \ i <j<K I 2 ZjA 


<^( —iog ^exp 


P 


2 z« 


< E ,f mM ( ^ i| 2 rn + ^ 

y<j<K | 2 ZjA J PA J 


< IE It [ max 
\ 1 <3<K 


m 2 -^ 

2 Zj 


_log {K)\ 


where the last inequality holds for Zj > 2.25 log (K)//3 . Denote 


(A.10) 


(A.ll) 


z = f (zi,..., zk) ' G ^ > 0 


T /- roAT 


Introduce the function : JR Psmn X Fi K 1R: 


F A0 ($, z ) '= 5 ( ^ log | EL 6XP 


Z ? 1 


M 2 ~*j 

2 zo 


(A.12) 


Then by (A.10) and (A.ll) 


F H 0i ll - ^ >A ) 

< JEF Ai/ g(4>, z) 


(A.13) 


- p {i?jTk^-F>- 3 ~^ip-y ( a . 14 ) 

Lemma A.6 checks that F^ t p(-,z) admits applying the Lindeberg’s telescopic sum device 
(see Lindeberg (1922)) in order to approximate JEF^.p {$, z) with JEF^p (<£, z) . Define 
for q = 2 ,..., n — 1 the following M Psnm -valued random sums: 


s q = f E ^ + E Si = E ^ Sn = E 

i=q +1 


n —1 

def 


z=l 


i=2 


i— 1 
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The difference F A ,p (&, z) — F A ,p z) can be represented as the telescopic sum: 

Fa, fi ($, *) - Fa, 13 ($, *) = ^” =] {Fa,^ + $i,z) - f a, 8(S 1 + $i,z)}. 

The third order Taylor expansions of F A ,p(Si + <&i,z) and F A ,p(Si + <&i, z) w.r.t. the 
first argument at Si , and Lemma A.6 imply for each i = 1,... ,n \ 

FA,p(Si + $i,z)~ F A p(Si + $i,z) - V*F A ,p(Si, z) T ($i - $i) 

*i) T VlF A ,fi(Si, z)($i + $i) 


< 


(^™? K {\\ S 3,i + <l>j, if} PiHLx + 1 ™^.{ll 5 i,i + ^i,il| 3 } PillLx) , 


where the value C 3 (Z\,/3) is defined in Lemma A. 6 , and the random vectors Sji E ]R Pj 
for j = 1,..., K are s.t. for all i = 1,..., n 


Si=(sl i ,sl i ,...,s^ 


T 


By their construction S t and <Pi — <Pi are independent, E<Pi = E<Pi = 0 and Var Fi = 
Var <Pi , therefore 

\JFF A ,p($, z) - lEF A ,p( z) j 

= \ jy i=1 {EHA{Si + $i,z) - JEHA{Si + $i,z)} 

c 3(^0 (max K { \\Sj,i + 4> jti || 3 } Pillmax + wac K {11% + <?tl| 3 } P*||Lx) ■ 

i= 1 \ — — — ' 


< 


Lemma A.5 implies for all i = 1,..., n with probability > 1 — 2e 

1/2 


E max {\\Sji + 4>ji\\ 6 } ) < O 0 max || Var 1/2 (^ )|| 3 v / p max log(A')(p max + 6 x), 

l<j<K J J 1 <j<K 

_ 1 Icy 

and the same bound holds for (Emax\<j<K {\\Sjp + <t>jp\\ 6 }) ■ Denote 


<W = \ E (H^llmax)} 172 + (||^ tax)} 


1/2 


1^.116 ul /2 


2—1 


By Lemma A.4 it holds for t = (x + log(p sum )) 3 (v^c^o) n 3 with probability > 1—c 


\(b • II 6 < t II 6 < / 

I *T|I max — 11** 11 max — 


If x = Clogn, then the last bound on \EF A ,p(@, z) — EF^^i#, z )| continues with 
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probability > 1 — 6exp(—x) as follows 
\lEF At p(<l>, z) - lEF A} p(<P, z )| 

< C C3( 3 ’ ^ \/Pmax log(A> max ^ ^ax^ || Var 1 / 2 ^)|| 3 


- 3 (is + % + l °Z 1/2 ( K ) lQ g 3/2 ("iw) II Var 1 /2(0.)|j3 (2 i/ 2 c 2 ) 3/2 

= ^(A«. (A.15) 


The derived bounds imply: 

p (U*,!Will >*h) 

by (A.13) 

< ]EFA t p (<P, z - AIk) 

by (A.15) _ 

< !EF At p (^, z — Al K ) + S Si< j,(A,l3) 


by (A. 14) 
< 


p (U'l, {wi > 


-A- 


31og(A-)| 

2/3 / 


and similarly 


p (U*,{ ii^ii >*d) 


+ $3 ,</>(A /3), 


> 


F (U'l, {ll<M > *i + 3 -^p + 4 }) - 


(A-16) 


□ 


The next lemma is formulated separately, since it is used for a proof of another result. 

Lemma A.3 (Smooth uniform GAR). Under the conditions of Lemma A.2 it holds with 
dominating probability for the function F A> p (-,z) given in (A.12); 

1 . 1 . jp(uf =1 {ll^ll >*j}) <dEF A0 ($,z-A1 K )+5^,(3), 

1 . 2 . 3p(|jf =1 {ll^ll >*;}) > eh ap ($,z + 31 ° 2 g / f V) i^) ~<MA/3); 


2/3 

31 og(lf)l 


g.i. EF At p($,z) < IP (JJ . =1 |^ -jj , 

2-2- JEF Aj p ($,z) > IP ^U j=1 { Ill’ll > z i + 
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Proof of Lemma A.3. The first inequality 1.1 is obtained in (A. 16), the second inequality 
1.2 follows similarly from (A.14) and (A.15). The inequalities 2.1 and 2.2 are given in 
(A.13) and (A.14). □ 


Lemma A. 4. Let for some c^, gi, vq > 0 and for all i = 1,..., n, j = 1, ... ,p sum 

log® exp {Av/nl^l/c^} < A 2 Uq/ 2, |A| < gi, 

here <f>l denotes the j -th coordinate of vector 0j . Then it holds for all i = 1,..., n and 
m,t > 0 


w 


max I d>: 

Vl<7<Psum ' 


,J| m 


> t < exp < — 


nt 2 / m 

2^? 


+ log {jp sx 


Proof of Lemma A.4- Let us bound the rriaxj | <p4 | using the following bound for the 
maximum: 


max 

i<j<p sum 


Wi \ < io g{E.ri exp d^D}- 


By the Lemma’s condition 


®exp( max ^^|0f|) < exp (A 2 i^/ 2 + logp sum ) . 

U<i<p H J 

Thus, the statement follows from the exponential Chebyshev’s inequality. 
Lemma A. 5. If for the centered random vectors 0 • E K p i j = 1,..., K 

< ^oA 2 /2, 


I 70~ 

sup log IE exp < A- - - 

«y G i&, 1 || Var 1 / 2 (0 )7|| 

Il7ll#0 


< g 


for some constants uq > 0 and g > 1 maxi<j<# yj2pj log (A), then 


□ 


E max {||0,||} < Cv 0 max || Var 1 / 2 (0-)|| ^2p max log(A), 

l<j<K J l <j<K 

\ 1/2 

E max {110,■ ||^} ) < Cu 0 max || Var 1/2 (0 )|| 3 v / 2Pmaxlog(A')(p m a X + 6 x), 

l<j<K J J 1 <j<A 


The second bound holds with probability > 1 — 2e x . 

Proof of Lemma A.5. Let us take for each j = 1 ,,K finite £j -grids G j(e) C E Pj on 
the (pj — 1 )-spheres of radius 1 s.t 


V 7 E M p > s.t. || 7 || = 1 3 7o g Gj (e) : || 7 - 7o || < e, || 7o || = 1. 
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Then 


< (1 - £j 


\-i 


max 

76 G j{ej) 




Hence, by inequality (A.9) and the imposed condition it holds for all 
0 < n < g/ maxi<j</c || Var 1/2 (0^)11 : 


IE max {110,111 < max — 
1 <j<K 1 3 ’ \<j<K 1 


- IE max max 

l<j<K i — Ej l<j<A 7 eG,'( e! ) 1 J J 


< C-JElog E, E exp (/x7 T 0 j 


^ l<j<K -y£Gj(£j) 


<c ^ lo SS J2 E ^exp (m7 T ^) > 

U<J<^ 7 eG,( Ej ) J 

<C max M*card{G 3 -( gj -)}) W 
“ 1 <j</\ ^ 2 i<j<Ar" y ^ 3> " 

< C max ~f~ C—— 2 max ||Var(0 )|| 

i<i<A p 2 i<3<k j 

= Co i max.{ v / Pj} ^i.ax^ || Var 1/2 (0^)|| V 21 °gW 

for p = C 77/ 1 max {^/pj} 1/2 log(A')/ max || Var 1 / 2 (0 


i<i<ic 


i<?<^ 


For the second part of the statement we combine the first part with the result of Theorem 
B.3 on deviation of a random quadratic form: it holds with dominating probability for 
Vg^Var cf>j 

Il0il| 2 < 3qf (X, Vfy) 


< + 6x11 Vl || < \\Vl\\( Pj + 6x). 


□ 


Lemma A. 6 . Let r E J?, Psuin , 7 ^ E /or / = 1,..., K are s.t. T = ( 77 , • • •, 7a') T j 
and z c = (zi,..., zk) T s.t. zj > , then it holds for the function p (•, z) defined 

in (A. 12); 

||V 2 r F A „ (r, 2)11, < C 2 (A,P) {|| 7j || 2 }. C 2 (4« = C (F + 7) , 

||v?.F A o(r,z)|| 1 < {hill 3 }, c 3 (A« = c (T + A + L ) , 
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Proof of Lemma A.6. Denote 


■< r ) = Y !- =1 ex P ( & 


■ INI 2 -4 

2 Zj 


hM r )) = p- 1 log {s(r)}, (a.17) 


then Fi 3 >A (P z) = g (A 1 hp ( s(r ))) . Let 7 9 denote the g-th coordinate of the vector 
r £ M Psum . It holds for q,l,b,r = 1,..., p sum : 

L g " {A-'kMn)} 

+ ±g'{A-%(s(n)} FFh^n), 

Lg'" 

+ L g " {A-'hMm{^hMn)AhMn) 

A-g'{Am M m 

Let for 1 < q < p sum j(q) denote an index from 1 to K s.t. the coordinate 7 9 of the 
vector r = ( 7 ^,... , 7 ^') T belongs to its sub-vector 7 j( g ) • 




d 2 


d'yidj 1 


■Fp,A( r > z ) = 


d 3 


d7 ^yd7 bjF/ ^ (I>) 


d 

d^i 


hp(s(r)) 


_ d_ (n = 1 l q 

f5s(r)d'yi S s(r) z j(l j) 


\ 2z i(q) ) 


exp 
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d 2 11 d 2 11 d 

d'yid'y 1 ^ /3 s(r) d'yid'y 1 (3 s 2 (r) d'yi 

rr t /^\ 2 ) i / ? ii7,( g) ii 2 -4) N 


1 / 7' 

+ /?' 7 


^j(g) V Z j(q) J J 

_I^M 2 

s2 ( r ) l Z j(q) J 


——exp /?--- 

s ( r ) V 2 %(<?) 


'i(i) 


q = l- 


P i q i [J^kJ 2 - z I 

s ( r ) z i( g ) V 2 %) 


b(<?) 


P tV 


2 _ ~2 ' 
z i(q) 


p ^ 1 c::p 

! ( r ) z j(q) z j(i) \ 2 *J(«) 2 A 


,2 ^2 


By definition 


S 2 ( r ) Zj {q)Z m y 

(A. 17) of s(T) it holds for 

/ II I I o o \ 




j(q ) = j 




all r € M Psnm 


1 JijW 

-777 exp /3-— 

s{r) V 2^ 




A 1 I lITjll 

exp 




llTj" ] = i. 


I 2 -* 2 

'j 


Therefore, 


Psum 

E 

q,l =1 


A. hMn) A. hB[s{r)) 


< E^“ p 


E^ 

o=l 


< 


Similarly 


II || 2 

max 7 ,- - 

1 <j<K J Zj 

< max 117 i 11 2 for z« > ,/p7. 
l<j<A J J v J 


Psum 

E 

g,Z=l 


d 2 


dyidj 1 


Ms(r)) 


Psum 

E 

q,l,b= 1 


^/,,( s (r))E V ( s , r )) + _E_ v(s(r) ) < 


< C/3 max H 7 JI 2 , 

i<i<K J 

< c(/3 + /3 2 ) max || 7 _-| 

v 3 1 <j<K J 


□ 


A.2 Gaussian comparison 

The following Lemma shows how to compare the expected values of a twice differentiable 
function evaluated at the independent centered Gaussian vectors. This statement is used 
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for the Gaussian comparison step in the scheme (3.1). The proof of the result is based on 
the Gaussian interpolation method introduced by Stein (1981) and Slepian (1962) (see 
also Rollin (2013) and Chernozhukov et al. (2013b) and references therein). The proof is 
given here in order to keep the text self-contained. 


Lemma A.7 (Gaussian comparison using Slepian interpolation). Let the M Psum -dimensional 
random centered vectors and T be independent and normally distributed, f(Z ) : 

1R Psum i-a ]R is any twice differentiable function s.t. the expected values in the expression 
below are bounded. Then it holds 


\Ef($) -Ef(W)\ < - ||Var<£- Var<F|| . sup EV 2 f (Wt + T^Y^t] 
2 max te[o,i] ^ 2 


Proof of Lemma A. 7. Introduce for t G [0,1] the Gaussian vector process Z t and the 
deterministic scalar-valued function x(t): 

Z t d = $y/t + Wy/1 - t G M Psum , 
x(t) ^ Ef(Z(t)), 


then Ef{L>) = x(l), Ef(T) = x(0) and 

| Ef@) - Ef(V )| = |x(l) - x(0)| < f |x'(t)| dt. 

Jo 

Let us consider x' (t): 


x'(t) = —Ef(Z t ) = E 


i t d 


W( z ,)} Jt z t 


2 y/t 


e{cP T V f(Z t )} - 2 -^=E{p T Vf(Z t )} . (A.18) 


Further we use the Gaussian integration by parts formula (see e.g Section A.6 in Tala- 
grand (2003)): if ( x\ ,... ,x Psum ) T is a centered Gaussian vector and f(x i,... ,x Psum ) is 
s.t. the integrals below exist, then it holds for all j = 1 ,... ,p sum : 


Psum ( d 'l 

E{ Xj f{x i,...,x Psum )} = ^IF(x i x fc )lE , |-^-/(xi,...,a:p sum )| . (A.19) 


Let denote the j -th coordinates of <P and T. Let also j-/(Z 4 ) denote the 

partial derivative of the vectors f(Zt) w.r.t. the j -th coordinate of Zt . Then it holds 
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due to (A. 19): 


JE|Vv/(Z t )} 


Psum 

1 =i 




f(Zt) 


Psum 

= IE IE 

1,9=1 


cL<P dj 


Psum 

Vt ^2 IE (V^ IE 

1,9=1 


d 2 

dqdj 



Similarly for the second term in (A. 18): 



therefore 

1 Psum Psum /• »2 

-*££{* (*'**) - E (*'*’) 

j= 1 g=l *• 9 J 1 

< J ||Var$-Var!p|| sup ||.EV 2 /(Z t )|| . 

2 max te[o,i] 


□ 


A.3 Simultaneous anti-concentration for £2 -norms of Gaussian vectors 

Lemma A.8 (Simultaneous Gaussian anti-concentration). Let [4*1 ■ ■ ■ ■ 4>k € M Psmn 
be centered normally distributed random vector, and <f> 3 E , j = 1,... ,K. It holds 
for all Zj > y /pf and 0 < Aj < Zj , j = 1,..., K : 

p (U^ (INI > a) - p (U'l, llftll >% + Al) < 4.0({ 4 ,}), 


where 


A^dAj}) < C 


x Vlv log(A/2) + C max {Aj} 

1 <1 < K 


l™ log ( 2 z j/4) 


and x max 1 <■/<a"{A j/ 2 j} < 1 is a deterministic positive constant. An explicit defi¬ 

nition of A ac ({Aj}) is given in (A.22). 
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Proof of Lemma A. 8. 

p (ftsm > a) - p (uf„ {ift-ii > +a}) 

F (ufl , 1 1 A U, 1 - 1 > o}) - IP {|5, U, - 1 > «}) 

(li 'Sk {"AHA 1 -'}>o)-P (,§§, {IIAIIA 1 -*}>*) 

M 


< 


= IP 


It holds 


< IP ( 0 < max |||0j||^- 1 — 1 !• < x 1 . 


1107II = SU P 17 4>j\ ■ 

7£JR p U 1 J 

11 - 711=1 


(A.20) 


Let Gj(ej) C 1R Pj (for 1 < j < K) denote a finite £j -net on (pj — 1)-sphere of radius 
1: 


V 7 € JR?* s.t. || 7 || =1 3 7o € G j(£j) : || 7 - 7o || < 
This implies for all j = 1,... ,K 

(1 - £j)|| 0 i || < max { 7 T 0 j| < 110 j 11 • 
Let us take £\,... , £k > 0 s.t. Vj = 1,... ,K 


= 1 . 


£;||0;lk; < x, 


(A.21) 


then 


0 < max 


— max max 


7 T 07 


l<j<K I Zj I \<j<K 7eGj (e,) I Zj 

and the inequality (A.20) continues as 


< x, 




< IP 


- lj> < X 
T 


7 0; 


max sup 

1 —•J—'ft’ 7 cg, (cj) I z; 


- 1 


< X 


The random values -y T - 1 ~ 2 Var{ 7 T 0^}) . The anti-concentration inequal¬ 

ity by Chernozhukov et al. (2014c) for the maximum of a centered high-dimensional 
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Gaussian vector (see Theorem A.9 below), applied to maxi<j<^ sup 7eGj ,( £j .) 4>jZ- 
implies 


IP 


max sup 


7 T 0, 




< A 



(A.22) 


where the constant C ac depends on min and max of Var{7 T 0 J z J - - 1 } < EWfyfzf < 
1; the sum Ylf=i{^/ s j} Pj * s proportional to cardinality of the set {"y t 7 E 

Gj(sj), j = 1 If one takes £j = 2C{Aj/(2zj)} p ^ +1 , then (A. 21 ) holds with 

exponentially high probability due to Gaussianity of the vectors 4>j and Theorem 1.2 in 
Spokoiny (2012b), hence 


^ac < CacXdl V clog i 2 /^ 


+1 


< C ac l ^V 1 V l°g(-^/ 2 ) + C . max { Aj } / max log (2 Zj/Aj 
S ' i<i<^ y i<i<^ 


(A.23) 
□ 


Theorem A.9 (Anti-concentration inequality for maxima of a Gaussian random vector, 
Chernozhukov et al. (2014c)). Let (X±,... , A P ) T be a centered Gaussian random vector 
with aj = f lEXj > 0 for all 1 < j < p. Let (j — mmi<j<p (jj ^ (j — m9;X^<(Y < Cp o *j . 
Then for every e > 0 

sup IP ( I max Xj — x\ < e) < C ac e\/l V log(p/e), 
xem \ i<i<p / 

where C ac depends only on a and a . When the variances are all equal, namely a = 
a = a, log (p/e) on the right side can be replaced by logp. 


A.4 Proof of Proposition A.l 

Proof of Proposition A.l. Let <P = f (<f>J, ..., 4>k) T E ]R Psum for p sum = f p\ + ■ ■ ■ + px 
(as in (A.5)), and similarly T = f ■ ., V’ J ^) T £ JR Psum . Let also <L> ~ AA(0,Var<P) 

and T ~ A/ r (0, Var<^). Introduce the following value, which comes from Lemma A.7 on 
Gaussian comparison: 

5 2 (A,f3) = f C 2 (A,/3) max sup { JE\\<f>-Vi + 1/7 yT - t\\ 2 

i<j<K t e [o,i] 1 

< C 2 (A,/3) max max {tr Var(0 J -), tr Vai’(^>j)} . 


(A.24) 
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It holds 


jp (U* i 1IWI>D) 


by > A ' 3 eh aa ,(< p,z + —,l^-i K ) 


by L. A.7, A.6 

> 1EH /I z + 


2/3 
3Iog(X) 


1 


2/3 Ik) -^6 2 s S 2 (A,P)-S 3 ^(A,P) 


by L. A.3 

> 


by L. A.8 

> 


by L. A.2 

> 


p (uf =1 {ll^-ll > z J + A+:n °0^}) - l^ 2 (A,P)-5 3 ^A,P) 

JP (uf =1 Will > *i - ^ - A }) - 2 %h(A,P) - <MA/3) 

- 2A ac (^{S Zj } + 2A + 31 °^ (A) ) (A.25) 

p (|J*, {Ill'll > Zj - <U) - 1* 2 MA,I3) (A.26) 

- S^A, p) - 5 34 ,(A, p) - 2Z\ ac ({5 2j } + 2A + , 


where 5 3> ^(A, ft) is defined similarly to 5 3 ^{A 1 P) in (A.15): 


3/2 

hi’(A, P) = f ^Y^- I ^log 1/2 {K) log 3/2 (np sum ) (2 vq c^, A^ max ) 3/2 . (A.27) 


By Lemma A.8 inequality (A.25) requires the following: 5 Zj + 2 A + 3 lo ^ A ^ < Zj .The 
bound in the inverse direction is derived similarly. Denote the approximating error term 
obtained in (A.26) as 

Ae 2 = \sl6 2 (A, p) + 6 3 j{A, P) + 5^(A, p) + 2Z\ ac ({5*.} + 2A + • 

Consider this term in more details, by inequality (A.23) 


4* (W + 24 + 3JS f 9 -) < («*, + 2 ^ + 

' +log ./ 2 (2w) - log 1 ' 2 U, + 24 + 31os(K) 

z j \ 


x < C- 


P 














ZHILOVA, M. 


41 


Let us take (3 = log l A ^ , then 


Z\ ac < 5CA 


l0gl/ ~( A) + C max _ ^ log 1/2 (AT) 


1 <j<K Zj 


+ C(5Z\ + ^2,max ) (log 1 / 2 (2z max ) + sj- log (5 Z ,min + 5Z\)^ , 


< 5CZ\ 1os1/ ~^ A) + C max ^ log 1 / 2 (A) 

^min l<i<A Zj 

+ 2C (5Z\ + (^max) ^ (^z,min 4“ 5/\) 


< 5CZ\ 


log^ ) + c max log 1 / 2 (/i) + 2C (5Z\ + 4, max) V- log (5Z\) 


i<j<a: Zj 


< 5CZ\ 


< 6Czl 


f log 1/2 (A-) + 24 1/2 , 1/2 | +c max S^ l i/ 2 {r) 

l Z min V ' J 1 <j<K Zj 

j logl/2( A-) +0 . 41og l/2 (5n l/2 ) \ 

^ -2^m i n ' 


(A.28) 


where the second inequality holds for <L m i n + 5 A < l/(2z max ), and the last one holds 
for 5* max < A and A > n -1 / 2 . 

by (A.27) log''’/ 2 (/i) Pmaic 0/9 , , -> . 

<M A P) + AAA P) < C ™ log 3 / 2 (np sum ) (A 3 , max + A 3 imax ) , (A.29) 

by (A.24) log(iL) — _ 

5 v;d2 (A,/3 ) < CSr - r 7 j — max max {tr Varfcp •), tr Var(i/> ■)} 

/\~' \ <A 3 <A TA ^ j J J 


< 


s a 2 \<r<K j 

nX 2 ^°S(-^)„ \2 ~1 

^"17 /i 2 Pmax max | / '0,max > / 'i/’i max / ' 


After minimizing the sum of the expressions (A.28) and (A.29) w.r.t A, we have 


A h < 12.5C 


Pn 


n 


log / (A) log / (np sum ) max {Aj, max , A,j )max } 


1/4 


+ 3.2C 6%p max z^ n ( j log 2 (A") log 3 / 4 (np sum ) max {Aj,, max , A ^ max } 7/2 


< 25C 


p 3 xVB 

rmax 

n 


log 9 / 8 (A") log 3 / 8 (np sum ) max{A^ max , A ^ max } 3/4 , 


where the last inequality holds for 

k)3 \ -1/8 

m£ 

n 


5% < 4Cp max z mi / 


log ' / 8 (A)log 3 / 8 (np sum ) (max{A^ max , A^ 1>max }) 11/4 . 


□ 
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B Square-root Wilks approximations 

This section’s goal is to derive square root Wilks approximations simultaneously for 
I\ parametric models, for the Y and bootstrap worlds. This is done in Section B.3 
below. Both of the results are used in the approximating scheme (3.1) for the bootstrap 
justification. In order to make the text self-contained we recall in Section B.l some results 
from the general finite sample theory by Spokoiny (2012a,b, 2013). In Section B.2 we 
recall similar finite sample results for the bootstrap world for a single parametric model, 
obtained in Spokoiny and Zhilova (2014). 

B.l Finite sample theory 

Let us use the notations given in the introduction: L k (Q ), k = 1,..., K are the log- 
likelihood processes, which depend on the data Y and correspond to the regular para¬ 
metric families of probability distributions {lPk{0),6 E O k C IR? k } . The general finite 
sample approach by Spokoiny (2012a) does not require that the true distribution IP of 
the data Y belongs to any of the parametric families {lP k (6)} . The target parameters 
6 * k are defined as in (1.3) by projection of the true measure IP on {!P k (6)} . Let D k 
denote the full Fisher information p k x p k matrices, which are deterministic, symmetric 
and positive-definite: 

D\ d =! f -V 2 e lEL k (0* k ). 

Centered p k -dimensional random vectors denote the normalised scores: 

D k l V e L k {dl). 

Introduce the following elliptic vicinities around the true points 0* k : 

0o,fc(r) = {6 E O k : \\D k {6 - 0£)|| < r}. 

Let 1 < k < K be fixed. The non-asymptotic Wilks approximating bound by Spokoiny 
(2012a, 2013) requires that the maximum likelihood estimate 6 k gets into the local 
vicinity Oo, k (?o,k) of some radius r 0i fc > 0 with probability > 1 — 3e -x , x > 0 . This is 
guaranteed by the following concentration result: 

Theorem B.l (Concentration of the MLE, Spokoiny (2013)). Let the conditions (ED o) , 
(ED 2 ), (Cq), (I) and (Cr) be fulfilled. If for each k = 1,..., K for the constants 
ro,fc > 0 and for the functions bfc(r) from (Cr) holds: 

b fc (r)r > 2{3 qf (x,® fc ) + 6w fc z'fc3fc(x + log(2r/r 0 ,fc))} , r > r 0 ,fc (B.l) 
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where the functions 3fc(x) and 3 q f(x, IB k ) are defined in (B.3) and (B.4) respectively, 
then it holds for all k = 1,..., K 


IP (o k <9o,fc(r 0 ,fc)) < 3e x . 


The constants co k ,u k and a k come from the imposed conditions (EDo) - (X) (from 
Section 5). In the case 5.3 > Cy/p k + x. 


Theorem B.2 (Wilks approximation, Spokoiny (2013)). Under the conditions of The¬ 
orem B.l for some tqj. > 0 s.t. (B.l) is fulfilled, it holds for each k = 1 ,K with 
probability > 1 — 5e _x 


>{L fc ( 0 fc ) - L k (0 * k )} - ||£ fc || 2 < Afc )W a(r 0 )fc ,x), 

L k (6 k ) - L k (0 * k )| - \\£ k \\ < A k) w (r 0 ,fc,x) 


for 


A k) w ( r ,x) = 3r |3(r) + 6v k }> k (x.)ui k } , (B.2) 

4 pi 2 f _ N 

w 2 ( r ’ x ) = g {2r + 3qf(x,lB fc )}2l fej w(r,x), 

3fc(x) = f 2^/pf + \/2x + Ap k (xg k 2 + l)g A : 1 . (B.3) 


In the case 5.3 it holds for r < r ok : 


A k , w( r ,x) < C Pk J- X , A k W 2 {r,x) < C\J 


(p k + x) 3 


n 


The constants g k and Sf.fr) come from the imposed conditions (EDq) , (£o) (from Sec¬ 
tion 5). The function IB k ), defined in (B.4), corresponds to the quantile function 

of deviations of the approximating random value ||£ fc || (see Theorem B.3 below). 


The following theorem characterizes the tail behaviour of the approximating terms 
||£ fc || 2 . It means that with bounded exponential moments of the vectors £ k (conditions 
(ED 0 ), (X)) its squared Euclidean norms ||£ fc || 2 have three regimes of deviations: 
sub-Gaussian, Poissonian and large-deviations’ zone. 


Theorem B.3 (Deviation bound for a random quadratic form, Spokoiny (2012b)). Let 
condition ( ED 0 ) be fulfilled, then for g k > ^j2tr(lB k ) it holds for each k = 1,..., K : 


^(ll^ll 2 > 3qf(x,IBfc)) < 2e _x + 8.4e~ Xc,fe 
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where B 2 d = , X(B k ) is a maximum eigenvalue of IB |. 


def 


3 qf = 


tr (B%) + yj8tr(B k )x, x < y^2tr(^)/{18A (B k )}, 

tr (IB 2 ) + 6xA (B k ), ^2UfB$/{18X(B k )} < x < x C)fc , (B.4) 

|z c ,k + 2(x - x Cjfc )/g Cifc | 2 A (B k ), x > x C)fe , 


2x Cifc d = 2 x C:k (B k ) d = n c z\ k + logdet (l Pk - pL c B k /X(B k )), (B.5) 

z lk = f {zI/hI - tr (B k )/p, c } /A(® fc ), 
g c ,fc = f \J%l~ Veto {Bl)/y/\(B k ), 

He = 2/3. 

T/ie matrices V k come from condition (ED 0 ) and can he defined as 

V‘k == Var {VoL k (0* k )} . 


By condition (X) tr (B 2 ) < a k p k , tr(IB 4 ) < a k p k and A {B k ) < a 2 . In the case 5.3 
gfc = C \fn , hence x Cyk = Cn , and for x < x Ctk it holds: 

3q f (x, B k ) < a k (p k + 6x). (B.6) 


B.2 Finite sample theory for the bootstrap world 

Introduce for each k = 1,..., K the bootstrap score vectors at the point 6 G 0 k : 


C(0) D-'VeCl(0) 


= Y, D k 1 ^e^k{0){u i - 1 ). 
i =1 


Theorem B.4 (Bootstrap Wilks approximation, Spokoiny and Zhilova (2014)). Under 
the conditions of Theorems B.l and B.5 for each k = 1 ,,K and some r(j k > 0 s.t. 
(B.l) and (B.9) are fulfilled, it holds for each k with IP -probability > 1 — 5e _x 


B° 

P° 


snp2{Ll(G)-L 0 k (e k )}-U° k (e k )\\ 

0£& k L > 

^sup 2 { L° k (0) - L° k {e k )} - Ml(O k )\\ 


< ^°, W 2 ( r o> 1 - 4e x , 
< 2\fc )W (r 0jfc , x) J > 1 - 4e _x . 
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where the error terms A° w (r, x), Z\° w 2 (r, x) are deterministic and 


A k, w( r ’ x ) = f 2A k, w(r,x) + 36i/ fc ro;i ijfc (r,x)3fc(x), 

def 1 


A 


*°,w 2 ( r ’ x ) = Ys { 12 rZ ^°,w( r ’ x ) + A k,w( r > x ) 2 } • 
A k,w ( r ; x ) and 3fc(x) are defined in (B.2) and (B.3) respectively and 

u i,fc(r,x) = = f C " l ' k ^ + 2w fc i/ fc v / 2x, 


(B.7) 


(B. 8 ) 


where C m> fc(r), Uk, Vk come from the imposed conditions (Com) , (-ED 2 ) and (ED 0 ) . 
For the case 5.3 and r < r 0i fc it holds: 


A k,w( r,x) < C 


Pfc + x 


n 


A °2(r,x) < C 


(Pfc + x) 


n 


x. 


and tai i fc(r) < Cr/n + C \Jxjn . 

Theorem B.5 (Concentration of the bootstrap MLE, Spokoiny and Zhilova (2014)). 
Let the conditions of Theorems B.l and B.7, (£ 0 m) and (ED 2 m) he fulfilled. If the 
following holds for each k = 1 ,...,K, wi i fe(r, x) defined in (B. 8 ) and the IP-random 
matrices B\ ^ Df} Var° {V 0 L° (61)} Df} 1 : 


bfc(r)r > 2 { 3 qf (x, B k ) + 3qf (x, B k ) + Gv k 3fc(x)wi i fc(r 0 ,fc)ro,fc} (B.9) 

+ 12n fc (a; fc +wi ifc (r,x))3 fc (x + log(2r/ro,fc)) for r > r 0)fe , 


then for each k it holds with IP -probability > 1 — 3e x 

JP° (0°k t ©o,fc(r 0 ,fc)) < 3e“ x . 

Lemma B.6 below is implied straightforwardly by Lemma B.7 in Spokoiny and Zhilova 
(2014). 

Lemma B. 6 . Let the conditions of (Eb) , (£om) and (ED 2m ) be fulfilled, then for 
each k = 1,..., K it holds for r < r 0 .k with IP -probability > 1 — e _x 

P°( sup H€2W-€2W)II < ^|,ft(r,x) J >1 —e- x , 

\0e6>o, fe (r) ) 

where 


A l,k( r > x ) = 6 n fc 3 fc(x)wi ifc (r,x)r 


In the case 5.3 it holds for the bounding term 

A o ( . Pfc + X 

A e ( r o, x) < C—— 

s ^ / 71 








46 


Simultaneous bootstrap confidence sets 


Theorem B.7 (Deviation bound for the bootstrap quadratic form, Spokoiny and Zhilova 
(2014)). Let conditions (Eb) , (X), (SDi) , ( Ib ) be fulfilled, then for each k = 
1 ,,K and g k > yj 2tr (B k ) it holds: 

JP° (ll^(^)ll 2 < 3 q 2 f (x,i3 fc )) > 1 - 2e _x - 8.4e- x ^*H 

where 

B\ d 4 f D- k l V 2 (0l)D- k \ VlM) ^ Var° V 0 A° (d* k ), 

3 q f(x, •) and x c ,k(') are defined respectively in (B.4) and (B.5). Similarly to (B. 6 ) it 
holds for x < x Ctk (B k ) : 

3q f (x, B k ) < a° 2 (pfc + 6 x) 
for a k 2 = (1 + 3y jfc (x))(a 2 . + a^*.) 

and 3y fc (x) defined in (C.l) (see Section C.l on Bernstein matrix inequalities). 


B.3 Simultaneous square-root Wilks approximations 


The statements below follow from the results from Sections B.l and B.2 by probability 
union bound. 

Lemma B .8 (Simultaneous concentration bounds). 

1. Let conditions of Theorem B.l be fulfilled and (B.l) hold for each k = 1 
with x = xi + log(A") for some xi > 0, then 

p (uL {** i 0o,fc(ro.fc)}) < 3e _Xl . 

2. Let conditions of Theorem B.5 be fulfilled and (B.9) hold for each k = 1,..., K with 
x = xi + log (A') for some xi > 0 , then it holds with IP -probability > 1 — 3e _Xl 

P° (Uti i @o, fc (ro, fc )}) < 3e _xi . 


Lemma B.9 (Simultaneous Wilks approximations). 

1. Let the conditions of part 1 of Lemma B.8 be fulfilled for some ro,fc > 0 and 
x = xi + log(A'), then it holds 
.K 


IP 


n fe=l {l 2 {Lk{0k) - L k (dl)} - ||C fc || 2 | < W 2 (r 0j fc, x x + log(A'))} 


> 1 - 5e 


-XI 


p 


fta I \^{Lk(0k) - L k (0%)} 


~ Il£* 


< A k) w (ro,fc, xi + log(AT)) > > 1 - 5e 


-xi 
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2. Let the conditions of parts 1,2 of Lemma B.8 he fulfilled for some ro,fc > 0 and 
x = xi + log(A") , then it holds with IP -probability > 1 — 5e -xi 


IP” 




H sup2{L°(0)-L°(0 fc )}-||^(0 fe )ll 

Li l e1 J 

H {|^P 2 { L °W - ~ H^(^)ll 


< 0 : fc ’ Xl +l °g( K )) j j > 1 - 

< ^°,w( r 0,fc, X! + log(AT)) > ] > 1 - 


Lemma B.10. Let the conditions of Lemma B.6 be fulfilled, then it holds with IP - 
probability > 1 — e _x 


/ 


IP 


rf 

1 i/c=i 


sup Ul(0) -£fc( 0 DII < ^,fe(r,x + log(AT)) J- >l-e x . 
o&e 0 , k {r), 
r <r o,k 


C Proofs of the main results 

Before proving the statements from Section 3.2 we formulate below the Bernstein matrix 
inequality, which is necessary for the further proofs. 


C.l Bernstein matrix inequality 


Here we restate the Theorem 1.4 by Tropp (2012) for the random p sum x p sum ma¬ 
trix V 2 = f Var° (VeL°(0*) T ,..., \7oL° K (9* K ) T ) T from the bootstrap world. Matrix V 2 
equals to the sum of independent matrices Yar° (V g£i t i(6* L ) T Ui,... ,V o^i,i<(0* K ) T uf) . 
Let us denote 


9i = f (v fl 4,i(0i) T , ■ ■ ■, Vgli, K (0* K ) T ) T € lR Ps 


d ^H 


"{ 


9i9i 


- IE 


9iQi 


H 


-l 


then 


H 2 = BV 2 , 


- I 


Psum ’ 


Define also the deterministic scalar value 
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Theorem C.l (Bernstein inequality for V 2 ). Let the condition (SD i) be fulfilled, then 
it holds with probability > 1 — e _x : 

||^- 1 V 2 J ff- 1 -/ psum || < <5?(x), 

where the error term is defined as 

<5|(x) = f (log(p sum ) + x} + ^d 2 , {log(p sum ) + x} (C.l) 

and is proportional to yJ {log(p sum ) + x}/n in the case 5.3. 

We omit here the proof of Theorem C.l, since it follows straightforwardly from The¬ 
orem 1.4 by Tropp (2012), and is already given in Spokoiny and Zhilova (2014). 


C.2 Bootstrap validity for the case of one parametric model 

Here we state the results on bootstrap validity from Spokoiny and Zhilova (2014), they 
will be used for some of the further proofs. 


Theorem C.2. Let the conditions of Section 5 be fulfilled, then it holds for each k = 
1,..., K , z k > max{2, y/pfi} + C (p k + x)/y/n with probability > 1 — 12e _x : 


IP ( L k (e k ) - L k (0%) > zl/2) - P° (l°(0°) - L° k (0 k ) > zl/2) 


< Aull, k ■ 


The error term Z\f u n ; fc < C {(p k -T x) 3 /tt,} 1 / 8 in the case of i.i.d. model; see Section 5.3. 


Theorem C.3 (Validity of the bootstrap under a small modeling bias). Assume the 
conditions of Theorem C.2. Then for a < 1 — 8 e _x , it holds 


P (L k (G k ) - L k (9* k ) > (i° k (a)Y 


— a 


— full. k 


The error term fu.ii, fc < C{(p k + x) 3 /n} 1//8 in the case of i.i.d. model; see Section 5.3. 


Theorem C.4 (Performance of the bootstrap for a large modeling bias). Under the 
conditions of Section 5 except for (SmB) it holds for z k > max{2, y/Pk} + C(p k + x)/y/n 
with probability > 1 — 14e _x 

1. P (L k (9 k ) - L k (0%) > zl/2) < P° (. L° k (e ° k ) - L°(G k ) > z\/2) + 

2 . 3/c (^0 — “1“ ^b, full, /c) 

+ fiThfimofi] - fifijfifipLfififi} - 4 * 1 ,*, 


ik (^0 — ^b, full, /c) 

+ ffifififififi} - fir{Dp(Hl~Bl)Dp} + 4 * 2 *. 
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The term 4\b,fuii,fc < C{(Pk + x) 3 /n} 1/,s in the case of i.i.d. model; see Section 5.3. The 
positive values Z\ qf ,\,ki Aq± ,2,fc are bounded from above with (a k + a 2 B k )(y/8xp k + 6x) for 
the constants a k > 0, a 2 B k > 0 from conditions (X) , (X#) . 


C.3 Proof of Theorem 3.1 


Lemma C.5 (Closeness of £ (||£i||, • • ■, ||£x||) and L° (||£° ||,..., ||£^-||)). If the condi¬ 
tions (EDo), (T), (S^B), (X B ), (SDi) and ( Eb ) are fulfilled, then it holds with 

probability > 1 — 6 e _x for all 5 Zk > 0 and z k > y/pf + A e s.t. C max {n - 1 / 2 , d Zk } < 

1 <k<K 

A e < C min {1 /z k } (A e is given in (A. 3)): 

(ur=i^ ||>zfc -^>) - 

p (uti {|1 ^ 11 > Zk} ) - F ° (uti {|| ^n > Zk + <u) ^ ^2- 

for the deterministic nonnegative value 
/ 3 \ 1/8 

At 2 < 25C J log 9/8 (I<) log 3 / 8 (np sum ) {(a 2 + a|) (l + <5 2 (x ))} 3/8 . 

A more explicit bound on A^ 2 is given in Proposition A.l, see also Remark A.l. 

Proof of Lemma C.5. The statement follows from Proposition A.l and Theorem C.l. Let 
us take <f> k := £ k and := . Dehne similarly to in (A.5) 

S* (0.2) 

Condition (A.4) rewrites for (C.2) as 

||VarX - Var° X°|| max < 5% 


for some > 0 . Denote 


D 2 = f diag { D 2 ,..., D 2 k } , 

P 2 = f Var ( VeL^G \) T ,..., V e L^(0^) T ) T . 


D 2 is a block-diagonal matrix and V 2 is a block matrix. Both of them are symmetric, 
positive definite and have the dimension p sum x p sum . Let also 


V 2 d ^ f Var° (V 0 L°(0D T , ■ ■ ■, V 0 L° K (0* K ) r y , 

9i = f (0!) t , ..., V0£i,K(e* K ) T Y € M 


H 2 = Yll, E {9i9i } • B 2 d ^ f J^ =1 E {9i} IE { gi } T . 
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It holds 


Therefore 

||Var S — Var° 


Var E = Var° X° = D^V 2 D^ 1 , 


H 2 = PV 2 , V 2 = H 2 - B 2 . 

= II D~ l (V 2 - V 2 )B~ 1 || 

II V / II max 

< || D~ l (H 2 -V 2 )D~ l 

< <f|(x)||zr 1 if 2 .D- 1 || + Wd-'&d - 1 

< (5p(x) + £ 2 mb }(cr + a 2 B ) =: b\ . 


+ || 5 " 1 s 2 i)" 1 || 

max II II max 


(C.3) 

(C.4) 


Here inequality (C.3) follows from the matrix Bernstein inequality by Tropp (2012) (see 
Section C.l). Inequality (C.4) is implied by conditions (X B ) and (SmB) , and Cauchy- 
Schwarz inequality. 

Condition (Cl) of Proposition A.l is fulfilled for the vectors ^ i k and due to 
conditions (ED 0 ), (X) and (SDi), (Eb) , (SmB), (X B ) for := ci and c 2 : = 

□ 


(a 2 + a|) j<5 2 * + maxi<j< n ||# 1 1E [g^gj] H ^l 2 } 


def 


Proof of Theorem 3.1. Let us denote X 2 = x + log(A'). It holds with probability > 
1 - 12e _x 


P a 


Ul 1 {v /2L °( 0 °)- 2L °(^) > **} 

> F ° > Zk + A^ )fe (ro,fe,x 2 )}^ 

U fe= i{H^(^)ll > z k + 4\w,fe( r o,fe,x 2 ) + ^|,fc(ro,jfc,x 2 )} 

“ ^(iQii^ii ^- 4 w ,k (ro,fc,x 2 )|^ 

\J I ^ i y2L k {d k )-2L k (0* k ) > **}) - L\total; 


L.B.10 

> p 


L.C.5 


L.B.9 

> P 


for 


A def 

5 Zk := Av,fc( r 0 ,fc, x + log(A')) + d^.( r 0 ,fc, x + log (AT)) 
+ 4^,fc( r o,fc, x + log(AT)) 

< C Pk + X+ / l° g{K) ^ + log(K) in the case 5.3. 


(C.5) 

(C.6) 


n 


(C.7) 
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Definition of A^ 2 is given in Proposition A.l, see also Remark A.l. The bound from 
Lemma C.5 says: 


A h < 25C 


Pmax 

n 


1/8 


log 9/8 (A") log 3/8 (np sum ) {(a 2 + a|) (l + <5p(x))} 


3/8 


For 5~. bounded as in (C.7) the conditions C max \n X ' 2 ,8 Z , } < A e < C min {lIzA 

k v ’ i<k<K X ’ kS ~ E ~ i<k<K X ' kS 


are fulfilled. 


□ 


C.4 Proof of Theorem 3.2 

Proof of Theorem 3.2. For the pointwise quantile functions 3fc(a) and 3° (a) it holds for 
each k = 1 ,,K with dominating probability: 


3 k ( a T if) < 3^ (a), 

3° (a) >3fc (« + Auli k) - £k 


(C.8) 


here Z\f u u, fe 


< {(Pk + x) 3 /^/n} 1/8 


, it comes from Theorem C.2, and e k < C(p k + x)/- v /n, 


def J 0, if c.d.f. of L k (G k ) - L k (G * k ) is continuous in 3 k (a + A { U u ifc ); 

£ k = 

C(p k + x)/y/n s.t. (C.9) is fulfilled, otherwise. 


IP 


y/2{L k (0 k )-L k (0* k )} > 5k (a + ^fuii,fe) — ) > a + L\fuii,fc. (C.9) 


Indeed, due to Theorem C.2 and definition (1.5) 


1P° 


^2|L°(0°)-L°(0 fc )} > 3 k (a) 


— ^( V 2{L k (6k) - L k {0* k )} > j k (a) j + L\fuii,fc < a + Auii.fe 


therefore, by definition (2.3) 3°(a+Z\f u n )k ) < 3 fc(a). The lower bound is derived similarly. 
If there exist the inverse functions c _1 (-) and c° _1 (-) , then it holds for /3 G (0,1): 


JP (util \l 2L ^°k) - 2L ^l) > <c _1 (/5), 


JP° (UL{ \j2L° k (0l)-2Ll(e k ) > 3 ° (/?)}) < c°- 1 (/?) 


(C.10) 
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Therefore, it holds 


C° H/J + Aull , max) 

> p° (ur=i{V 2L fc(^) - 2L °k^ ^fun,fe)}) 

by > 8) P° (\J K k=i y 2L° k (9° k ) - 2 L° k (6 k ) > U (/?)}) 

^ >" 31 P (Ulli {y/2L k (d k )-2L k (0* k ) > ik (/?)}) - Aotai 

by L.C.6 
and (C.10) 

— ^ (A ^total 2l aC) m, 


here Ac,lr < Aotai (by Lemma C.6) and 


^full, max max zi full k 

1 <k<K 

< C{(p max + x) 3 /77,} 1 / S in the case 5.3. 


(C.ll) 


Thus 


c {fl T Ami, max) > c (/?) Z\t 0 f; a i Z\ aCj LR, 

C (A — c(o T ^total T ^a,c,Ln') T ^full,max- (C.12) 

Hence it holds 

P (u^ =1 { \j2L k {0 k )-2L k {0%) > il (/3)}) 

by(C.8) / k ( / Z \ 

- ^ ( Ufc-i{ V 2 A(0fc) — 2 A(0fc) > Ik (P + Aull,fc) — J 

by L.C.6 
and (C.10) 

< c 1 (ft + Ami , max) “1“ ^ac,LR* 

Therefore, if c(a) > ^f u ii,max> then 

P (Ul, { yj2L k {0 k )-2L k (0%) > 5 ° k (c(a) — Aull, max)j*^ ^ Q; + ^ac,LR* 

And by (C.12) for c°(a) > 2Ami, max it holds 

P (u K k=1 {\j2L k (e k )-2L k (Gl)> 3 1 (c» - 2Ami,max)}) - « 

< Aotai + 2Zi aCiLR . 
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Similarly for the inverse direction: 

c 0 - 1 ^) < P° (yjl^2LlCel)-2LlCe k )> 3° (/?)} - 

< P° (ut=i {\l 2 L 0 k(K) - 2L° k (0 k ) > 5k (/3 + A full , fc ) - £l , k - e*}) 

< P 2Lk(0 k ) ^ (/3 “1“ ^fuU,fc)|^ “1“ ^total “1“ ^ac,LR 

— C (/? + ^full, max) T -^total T ^ac,LR) 
where 0 < < C (p k + x)/y/n . This implies 

C 1 (/?) < C 1 (/3 + ^full, max) + ^total + ^ac,LR 5 

C (c^) ^ C (a ^total ^ac,LR) ^full, max* (C.13) 

P (u^ =i { yj2L k (0 k )-2L k (0%) > 3° (/? + fe)}) 

by(C.8) / A- r /--- \ 

> P (U, =1 { V 2L ^k) - 2L fe (0£) > (0)J 

— c 1 (P) — 2\ aC]LR . 

P \j2L k (6k) — 2 L k (6 k ) > 3fc( c ( Q 0 + 2lfuii,max)|^ > a — Z\ aC!LR . 

And by (C.13) 

p (Ufc = i {\/2L k (G k )-2L k (6l) > 3 °(c°(a) + 2A full , max )}^) - a 

— -^total 2A aCjLR . 


for 


2 ^ 3 , total — 2\ tota j + 2A aCiLR < 3A tota j. (C.14) 

Conditions of Theorem 3.1 include z k > Cy/pk, therefore, it has to be checked that 
3°(ct) > Cy/pk. It holds by Theorem B.4, Proposition A.l, Lemmas B.6 and C.7 with 
probability > 1 — 12e _x : 

P° ^ 2{L° k (el) - L° k {6 k )} > C\Jp k - + C(p k + x)/ y/n'j 

> 1 - 8e“ x , 
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Taking 1 — 8e x > a , we have 

3k (°0 > c \Jpk~ yfexpk + c2 (Pk + x)/Vn. 

Inequalities for c°(a) had been already derived in (C.12) and (C.13) with 

Hpf 

A c = 4\total + k\ aCiLR . (C.15) 

□ 

Lemma C.6. Let the conditions from Section 5.1 be fulfilled, and the values Zk > y/pk 
and S Zk > 0 be s.t. C max {?r -1 / 2 , 5 Zk } < A e < C min {1 /zk} (A e is given in (A.3)), 

X<k<K l<k<I< 

then it holds with probability > 1 — 12e _x 

IP {\Jl^y2L k (ek)-2L k {ei)> z,}) 

~dP \J2L k {0 k ) — 2 L k {6\) > z k + d Zl ^j < Zl aC)LR , 

where 

f -r? \ 1//§ 

^ac,LR < 12.5C f log 9 / 8 (A') log 3 / 8 (np sum )h 3 / 4 . 

Proof of Lemma C.6. This statement’s proof is similar to the one of Theorem 3.1 (see 
Section C.3). Here instead of the bootstrap statistics we consider only the values from 
the Y -world. Let us denote X 2 == x + log(A). It holds with probability > 1 — 12e _x 

IP (uti {\/2Lk(0k)-2L k (0l)> z fc }) 

< P (u fc= i{ll^ll > Zk - Av,fc( r 0,fc>x 2 )}^ 

iz IP > z k + $z k + -^ w ,fe( r 0,fc 5 X 2 )|^ + k\ aC! L R 

— ^ (LU { \J^^kifik) — 2Lk(6*fi) > z k + 5 Zk + Z\ ac ,LR , 

where 

^ac,LR < 12.5C (p^ ax /n) 1/8 log 9/8 (A')log 3/8 (np suni )ci 3/1 . 

Similarly to (C.5) and (C.6) the term A aCjLR is equal to Ag 2 from Proposition A.l with 
A 2 S :=0, 5 Zk := 5 Zk + 2A W:k (ro }k , x + log(A)). □ 
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Lemma C.7 (Lower bound for deviations of a Gaussian quadratic form). Let 0 ~ 
A(0, Ip) and U is any symmetric non-negative definite matrix, then it holds for any 
x > 0 


IP tr U — ||lA 2 </>|| 2 > 2\/xtv(U 2 ) S j < exp(—x). 


Proof of Lemma C.7. It is sufficient to consider w.l.o.g. only the case of diagonal matrix 
U , since it can be represented as E = U T diagjai,..., a p }U for an orthogonal matrix 
U and the eigenvalues a\> ■ ■ ■ > a v \ Ucf ~ jV(0, I p ). 

By the exponential Chebyshev inequality it holds for fj, > 0, A > 0 


pfirE- ||A / 2 0|| 2 > < exp(-/rZV2)lEexp (m {tr X7 - ||iA 2 0 || 2 } /2 ) 

1 P 

log IE exp (jj, |tr E — \\U 1/2 (f>\\ 2 } /2) < - ^ {jm^ - log(l + a^t)} , 


3 =i 


therefore 


IP (tvE- ||27 1/2 0|| 2 > A) < 


exp 


1 

2 

1 r 


E P 

{log(l + ajii) - fiaj} 


<exp(-- nA - /a 2 a 2 /2 


< 


exp (-A7{-1 £),!“?}) 
If x := xl 2 / , then A = 2y4 V''_ ej . 


□ 


C.5 Proof of Theorem 3.3 

def 

Proof of Theorem 3.3. Let us denote X 2 = x + log(if). By Lemmas B.9, B.10 and C.5 
it holds with probability > 1 — 12 e _x 

P° (u K k=l {^Ll(e° k )-2Li(e k )> **}) 

> F ° (U^i{ll«W)|| > z k + ^w,fc( r 0 ,fe) x 2 ) + ^|,fe(r 0 ,fc, X 2 )|^ 

> IP ^Ua ;=1 {ll^ll > Zk - Z\ w ,fc(ro,fc,x 2 )}^ - A),total (C.16) 

— ^ (U_i{ll^ll > Zk — Av,fc( r 0,fcj x 2) — A,total (C.17) 

> P (u * =1 {>/2L k {9k)-2L k (0* k ) > - A,total, 
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here = f [D k 1 H 2 D k 1 ) 1 ^ 2 (Var £ k ) 1 / 2 ^ fc , and ^b,total is given below. Using the same 
notations as in the proof of Lemma C.5, we have 





T 


= {D- 1 H 2 D- l ) 1 , 2 (\a,rS)- l / 2 ~, 


and by Theorem C.l and by conditions X) , (Xb) , it holds with probability > 1 — e 
IIVar E — Var° S° II = ||5 _1 (H 2 - V 2 )5 _1 1| 

II II max II V / II max 

< «p(x)(a 2 + a|). 


Thus, inequality (C.16) follows from Proposition A.l applied to the sets of vectors 
£i(0*), ■ ■ ■ ,£° k ( 0 * k ) and The error term A bitotal is equal to Aotal from 

Theorem C.3 (see (C.5), (C.6)) with c) 2 iib := 0, thus 

/ 3 \ 1/8 

^b,total < 25C log 9/8 (A') log 3/8 (np sum ) { (a 2 + a|) (l + 5~(x)) } 3/8 . 

Inequality (C.17) is implied by definitions of and matrices H 2 , V k , indeed: 


< 




< 1 , 


therefore, ||£J > ||£ fe || . 

The second inequality in the statement is proven similarly to (C.12). It implies 
together with Theorem C.4 the rest part of the statement having 

4^b,c = total + 4\ aCiLR . (C.18) 

□ 
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